@Clifford Heath I've seen the birthday problem before but didn't relate it to this - thanks for pointing that out! It seems that avoiding collisions is a hard thing to do - especially these days when you may have to generate billions if not trillions of these short-urls. The whole topic is fascinating.
cheers, DAZ On Mar 12, 11:48 pm, Clifford Heath <[email protected]> wrote: > On 13/03/2011, at 7:53 AM, DAZ wrote: > > > How likely is rand(36**8).to_s(36) to have a collision compared to > > truncating UUIDTools::UUID.random_create? > > If you have N possible values, you have an approximately 50/50 chance > of a collision in sqrt(N) perfectly random values. Or to put it > another way, > if you have an ID of 2*M bits, you get to 50/50 at in a population of > 2^M > values. This is called the birthday paradox, google for it. > > There's a good reason why UUIDs contain 128 bits... (and MD5... and > SHA-1 hashes are 160, SHA-256 is 256, etc). > > > I realise that with smaller strings the chances of collision are > > larger. How do sites like disqus and bit.ly make their short urls? > > They probably use a character encoding of a database ID. > > You should probably use a character encoding likewise. Here's a snippet > I've used, which reduces a full UUID to 20 characters (using base-91): > > require 'sysuuid' > > class Integer > def base(b) > self < b ? [self] : (self/b).base(b) + [self%b] > end > end > > BASE91 = '!#$%&()*+,-./0123456789:;<=>? > @ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~' > > uuid = sysuuid.gsub(/-/,'').reverse.hex > uuid20 = uuid.base(91).map{|i| BASE91[i].chr }*'' > uuid_again = uuid20.split(//).inject(0){|i,e| i*91 + > BASE91.index(e[0]) } > > Clifford Heath. -- You received this message because you are subscribed to the Google Groups "DataMapper" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/datamapper?hl=en.
