Wednesday, October 21, 2009

Wolfram|Alpha use case: Calculating probability of hash collisions

A collegue was discussing with me today whether seven hex digits (what Git commit names are usually abbreviated to) is sufficient to avoid collisions. Usually four-five just works, but the engineer answer is as usual "I can calculate that!". Wikipedia states regarding the birthday problem:

"Given n random integers drawn from a discrete uniform distribution with range [1,d], what is the probability p(n;d) that at least two numbers are the same?"

So, quickly plugging this into Wolfram|Alpha, assuming seven hex digits gives d=16^7, and a reasonable set of git objects, say n=10000 :
{ n = 10000, d = 16^7,
  p = 1-exp(-(n (n-1))/(2 d)),
  q = 1-((d-1)/d)^n }

Which indeed gives that p = 17% probability that there for this setup would exist at least one collision in the whole set, but on the other hand you only use the abbreviation manually and the probability to actually use one of these is only q = 0.0037% . Q.E.D.

See also my first and second tweet about this. I have other entries about git, wolframalpha or just plain nerdy.

Tuesday, October 20, 2009

Why Git will revolutionize the world of open source


Today, looking at the Pro Git book homepage, I was reminded of how sure I am that Git will revolutionize open source. It is just so simple to fork and start contributing to a project. When you've done something, there's even a button to ask the owner of the project to pull back your contribution.

The graph above is a section of the network graph of the progit book, I.e. the mesh of contributions to the project. The book is being translated into German, Chinese etc., as can be seen in the Github inline rendering a chapter of the book. Absolutely smooth.

To add insult to insanely cool flexibility injury (eh, sorry about that figure of speech) Github allows you to host project homepages also through git containing markdown or anything Jekyll will swallow. Maybe also check out the Github hosted homepage of sitaramc, the creator of the gitolite alternative to the gitosis server.

To be honest, git development so far has been pretty minimalistic, it has reached it's current usability very fast. It is not as well integrated in different platforms and systems as subversion is, partly because the usage model is so different, partly because Git hasn't been failing for as many years as subversion has. However, don't let yourself be fooled, Git is absolutely usable and there is simply no substitute to what it's reliability and streamlining can do for your coding process. I hope to elaborate on that topic any of these days.

What I'm next keeping my eyes open for is SCRUM and agile development in open source, but I suspect I'll have to wait for that...

I have my own Github profile with only some stuff in it, other posts about open source as well as a good collection of git links.

Update: My collegue Lauri went ahead to try out Github, and after naming the repository correctly to and realizing you should use the master branch there, "gh-pages" is only for pages (which are cool themselves!) his pages are working, and he can play around with this smoothly web-connected storage as much as he wants!

Secondly, Git absolutely has benefits to other than open source projects! Our company does essentially no open source, and still check out my presentation about the cool benefits we've experienced, in particular how it streamlined the workflow an helps us capture and keep information at the right time.

Sunday, October 11, 2009

Spotify patents available to the world!

It is a common misconception that patenting has something to do with keeping secrets, while it is actually just the opposite - it is a way of gaining protection for your innovation, on the condition that you publicly disclose it for the world to see.

Even if you don't get the patent, your information will remain publicly available (as far as I know), and if you get the patent, you're still gonna have to pay the yearly fees to maintain the protection. In this sense, patent databases are hardly just lists of licensing obstacles to avoid or purchase, but vast repositories of knowledge on methods and solutions. Repositories which are largely ignored.

My friend Martin who this weekend visited Tallinn notified me that there would be a Spotify patent out there in the names of the people we know there, Ehn, Strigeus et al. Said and done, break out and go search, and anyone can find the either 11-page US or 14-page European patent applications for Spotify. Go to the "Original document" tab, ignore the one page preview and instead choose the "Save Full Document" link, fill in the CAPTCHA and you can download the full PDF.

Other organisations which may have exciting patents to read:

(PS. Look at that, Computer Sweden picked up this blog post, so I will take the opportunity to suggest my other posts on Spotify, my candid pictures from the Spotify office or other stuff I find cool. I'm @unclecj on twitter)

(PPS. Browsing around for how the Computer Sweden guy picked up on this blog post, he thinks it may have been through @jocke, I find that my other friend @erikstarck had RT'd the patent before me and a fine article about the costs for running Spotify. My insider info - the licensing costs are very relevant, someone has spent a lot of work to be able to keep a music service which is both free and legal alive. And they are doing their best to manage the network costs, while paying attention to that as you grow, your role in the network affects how much you have to pay someone for what)

Thursday, October 1, 2009

The possibility of physical and mental collapse...

"The possibility of physical and mental collapse is now very real. No sympathy for the Devil, keep that in mind. Buy the ticket, take the ride." (quote)
Tonight it hit home with me how messy my online footprint has become. Sure you can just google "unclecj", but each of its pieces, including this blog, is very neglected. I barely even open my Google Reader anymore, EventBox sort of passed away with my old harddisk, Jaiku and bloggy are all but forgotten and my ambitions to write on several exciting topics are simply moving nowhere. Indeed my "workflow" is very streamlined, microblogging through and stuff, which also means I take time for almost nothing outside of the everyday flow.

It took me a long while to figure out I used my OpenId to connect with twitterfeed, and what happened to feeding my blogs to twitter before I simply don't know. I would love to clean up my blogs and post some new things, but it's not gonna happen, I'm simply too busy with more important commitments.

So, enjoy some fear and loathing while I'm being confused. I got some cool books while overseas, good stuff.