Hi Dan,

> You could use the record's id, and then add a checksum digit using the
> Luhn or Verhoeff algorithm, and then convert the resulting number to a
> base 36 string.

> There are three advantages to this approach.
>
>   1) you don't have to worry about generating a random value and
> dealing with collisions since the database handles it for you
>   2) you can detect typos and mistakes without having to hit the
> database
>   3) people won't be able to guess the URLs unless they are familiar
> with the exact algo you're using

I've just been playing around with luhnacy and oklasoft-verhoeff gems.

The main problem now seems to be  with point 3 - because only the last
digit is changing there is very little difference in the resulting
strings for bigger integers
eg:
def short_url(id)
  Verhoeff.checksum_of(n).to_i.to_s(36)
end

short_url(12897) => "2rip"
short_url(12897) => "2riv"

To get round this I had a go at adding a number at the beginning then
reversing the digits before adding the chechsum digit:

def short_url(id)
  id = id + 13
  id.to_s.reverse.to_i
  Verhoeff.checksum_of(n).to_i.to_s(36)
end

This seems to do the trick:

short_url(12897) => "etp"
short_url(12897) => "2jzf"

My only worry now is have I compromised point 1 - are the values still
unique? I think they are but will need to have a bit more of a think
about the possibilities.

Point 2 is a bonus - being able to check a URL for authenticity before
hitting the database to search for it.

So I think this might work ... thanks to everybody for their help and
suggestions!

DAZ
ps - would I still store this as type UUID, or just a string?





On Mar 13, 12:06 am, "Dan Kubb (dkubb)" <[email protected]> wrote:
> DAZ,
>
> > I definitely need short strings - 6-8 characters for the url. It is
> > for the url of e-cards that people send - they don't have to be secret
> > urls, but it would be nice if people couldn't easily guess other urls
> > and read other peoples cards, so just using the auto-incrementing id
> > isn't really an option :(
>
> You could use the record's id, and then add a checksum digit using the
> Luhn or Verhoeff algorithm, and then convert the resulting number to a
> base 36 string. There are libraries to handle the checksum generation
> and testing so it would only take a couple of lines of code for both
> operations.
>

>
> A determined hacker could just brute force things too, I don't see any
> way for 100% protection in those cases. The best thing you can hope
> for is to discourage casual exploration of the URL space.
>
> > How likely is rand(36**8).to_s(36) to have a collision compared to
> > truncating UUIDTools::UUID.random_create?
>
> It's probably the same.
>
> > I realise that with smaller strings the chances of collision are
> > larger. How do sites like disqus and bit.ly make their short urls?
>
> I don't know precisely. I'd guess they do something like above, I
> don't see how they could do it any other way at the scales they are
> working at.
>
> --
>
> Dan
> (dkubb)

-- 
You received this message because you are subscribed to the Google Groups 
"DataMapper" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/datamapper?hl=en.

Reply via email to