Finally got something that I think works and doesn't rely on any
outside libraries:

def short-url(id)
  string = id.to_s + (rand(8)+1).to_s
  string.reverse.to_i.to_s(36)
end

The main problem before was ending up with a number that ending in
zero. When it was converted to a string, reversed then converted back
into an integer, the zero was lost. I can ensure that a number never
ends in zero by appending a random digit from 1-9 on the end of the id
string.

This produces short urls that are unique and all look very different
so should stop people casually looking for different urls.

I think it fits the bill, anybody got any thoughts?

thanks for all the help and ideas - very interesting stuff,

DAZ







On Mar 13, 1:14 pm, DAZ <[email protected]> wrote:
> Still doesn't work...
>
> Verhoeff.checksum_of accepts strings that begin with leading zeroes,
> but can give the same answer occasionally, for example:
> Verhoeff.checksum_of('02') => 23
> Verhoeff.checksum_of('000002') => 23
>
> That means my short_url function will give the same url for ids of 7
> and 199,987 both give the same resulting short url ('n')
>
> :(
>
> Looks like while reversing the string wont' work as a way of jumbling
> up the resulting url.
>
> Next try is to apply the Verhoeff algorithm 3 times, then convert to a
> base 36 number:
>
>  def short_url(id)
>
> >   id =  Verhoeff.checksum_of(Verhoeff.checksum_of(Verhoeff.checksum_of(id)))
> >   id.to_s(36)
> > end
>
> These seems okay ... for now!
>
> I never thought this would be so hard!
>
> DAZ
>
> On Mar 13, 12:37 pm, DAZ <[email protected]> wrote:
>
>
>
>
>
>
>
> > FAIL!
>
> > > def short_url(id)
> > >   id = id + 13
> > >   id.to_s.reverse.to_i
> > >   Verhoeff.checksum_of(n).to_i.to_s(36)
> > > end
>
> > 11 and 227 both map to the same short url (because '240' is reversed
> > to make '042'.to_i = 42)
>
> > I think this can be fixed because the Verhoeff.checksum_of method
> > accepts strings starting with 0:
>
> > > def short_url(id)
> > >   id = id + 13
> > >   id.to_s.reverse
> > >   Verhoeff.checksum_of(n).to_i.to_s(36)
> > > end
>
> > Hopefully this now does produce unique results ... anybody see any
> > problems?
>
> > cheers,
>
> > DAZ
>
> > On Mar 13, 12:06 pm, DAZ <[email protected]> wrote:
>
> > > Hi Dan,
>
> > > > You could use the record's id, and then add a checksum digit using the
> > > > Luhn or Verhoeff algorithm, and then convert the resulting number to a
> > > > base 36 string.
> > > > There are three advantages to this approach.
>
> > > >   1) you don't have to worry about generating a random value and
> > > > dealing with collisions since the database handles it for you
> > > >   2) you can detect typos and mistakes without having to hit the
> > > > database
> > > >   3) people won't be able to guess the URLs unless they are familiar
> > > > with the exact algo you're using
>
> > > I've just been playing around with luhnacy and oklasoft-verhoeff gems.
>
> > > The main problem now seems to be  with point 3 - because only the last
> > > digit is changing there is very little difference in the resulting
> > > strings for bigger integers
> > > eg:
> > > def short_url(id)
> > >   Verhoeff.checksum_of(n).to_i.to_s(36)
> > > end
>
> > > short_url(12897) => "2rip"
> > > short_url(12897) => "2riv"
>
> > > To get round this I had a go at adding a number at the beginning then
> > > reversing the digits before adding the chechsum digit:
>
> > > def short_url(id)
> > >   id = id + 13
> > >   id.to_s.reverse.to_i
> > >   Verhoeff.checksum_of(n).to_i.to_s(36)
> > > end
>
> > > This seems to do the trick:
>
> > > short_url(12897) => "etp"
> > > short_url(12897) => "2jzf"
>
> > > My only worry now is have I compromised point 1 - are the values still
> > > unique? I think they are but will need to have a bit more of a think
> > > about the possibilities.
>
> > > Point 2 is a bonus - being able to check a URL for authenticity before
> > > hitting the database to search for it.
>
> > > So I think this might work ... thanks to everybody for their help and
> > > suggestions!
>
> > > DAZ
> > > ps - would I still store this as type UUID, or just a string?
>
> > > On Mar 13, 12:06 am, "Dan Kubb (dkubb)" <[email protected]> wrote:
>
> > > > DAZ,
>
> > > > > I definitely need short strings - 6-8 characters for the url. It is
> > > > > for the url of e-cards that people send - they don't have to be secret
> > > > > urls, but it would be nice if people couldn't easily guess other urls
> > > > > and read other peoples cards, so just using the auto-incrementing id
> > > > > isn't really an option :(
>
> > > > You could use the record's id, and then add a checksum digit using the
> > > > Luhn or Verhoeff algorithm, and then convert the resulting number to a
> > > > base 36 string. There are libraries to handle the checksum generation
> > > > and testing so it would only take a couple of lines of code for both
> > > > operations.
>
> > > > A determined hacker could just brute force things too, I don't see any
> > > > way for 100% protection in those cases. The best thing you can hope
> > > > for is to discourage casual exploration of the URL space.
>
> > > > > How likely is rand(36**8).to_s(36) to have a collision compared to
> > > > > truncating UUIDTools::UUID.random_create?
>
> > > > It's probably the same.
>
> > > > > I realise that with smaller strings the chances of collision are
> > > > > larger. How do sites like disqus and bit.ly make their short urls?
>
> > > > I don't know precisely. I'd guess they do something like above, I
> > > > don't see how they could do it any other way at the scales they are
> > > > working at.
>
> > > > --
>
> > > > Dan
> > > > (dkubb)

-- 
You received this message because you are subscribed to the Google Groups 
"DataMapper" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/datamapper?hl=en.

Reply via email to