A collegue was discussing with me today whether seven hex digits (what Git commit names are usually abbreviated to) is sufficient to avoid collisions. Usually four-five just works, but the engineer answer is as usual "I can calculate that!". Wikipedia states regarding the birthday problem:
"Given n random integers drawn from a discrete uniform distribution with range [1,d], what is the probability p(n;d) that at least two numbers are the same?"

So, quickly plugging this into Wolfram|Alpha, assuming seven hex digits gives d=16^7, and a reasonable set of git objects, say n=10000 :
{ n = 10000, d = 16^7,Which indeed gives that p = 17% probability that there for this setup would exist at least one collision in the whole set, but on the other hand you only use the abbreviation manually and the probability to actually use one of these is only q = 0.0037% . Q.E.D.
p = 1-exp(-(n (n-1))/(2 d)),
q = 1-((d-1)/d)^n }

See also my first and second tweet about this. I have other entries about git, wolframalpha or just plain nerdy.
How to download webradio and webtv - a comprehensive ripping tutorial
The farse of BankID
My Last.fm event + flickr mashup bargain and revolutionizing your Internet citizenship
FUF stöder Unga Forskare IRC-chat!
New iTunes 8, and why you should start using it
Young Scientists in the Physics Olympiad, prepare to beat them this year!
Sleepless with Spotify
Two days on facebook - 128 connections
Blog button - Förbundet Unga Forskare
Creative Commons Sunday!
My "ideary" and reviving the "ufsnack" mailing list
Now I would buy an iPhone! If only it had buttons...

No comments:
Post a Comment