@Clifford Heath

I've seen the birthday problem before but didn't relate it to this -
thanks for pointing that out! It seems that avoiding collisions is a
hard thing to do - especially these days when you may have to generate
billions if not trillions of these short-urls. The whole topic is
fascinating.

cheers,

DAZ

On Mar 12, 11:48 pm, Clifford Heath <[email protected]> wrote:
> On 13/03/2011, at 7:53 AM, DAZ wrote:
>
> > How likely is rand(36**8).to_s(36) to have a collision compared to
> > truncating UUIDTools::UUID.random_create?
>
> If you have N possible values, you have an approximately 50/50 chance
> of a collision in sqrt(N) perfectly random values. Or to put it  
> another way,
> if you have an ID of 2*M bits, you get to 50/50 at in a population of  
> 2^M
> values. This is called the birthday paradox, google for it.
>
> There's a good reason why UUIDs contain 128 bits... (and MD5... and
> SHA-1 hashes are 160, SHA-256 is 256, etc).
>
> > I realise that with smaller strings the chances of collision are
> > larger. How do sites like disqus and bit.ly make their short urls?
>
> They probably use a character encoding of a database ID.
>
> You should probably use a character encoding likewise. Here's a snippet
> I've used, which reduces a full UUID to 20 characters (using base-91):
>
> require 'sysuuid'
>
> class Integer
>    def base(b)
>      self < b ? [self] : (self/b).base(b) + [self%b]
>    end
> end
>
> BASE91 = '!#$%&()*+,-./0123456789:;<=>?
> @ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~'
>
>      uuid = sysuuid.gsub(/-/,'').reverse.hex
>      uuid20 = uuid.base(91).map{|i| BASE91[i].chr }*''
>      uuid_again = uuid20.split(//).inject(0){|i,e| i*91 +  
> BASE91.index(e[0]) }
>
> Clifford Heath.

-- 
You received this message because you are subscribed to the Google Groups 
"DataMapper" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/datamapper?hl=en.

Reply via email to