Finally got something that I think works and doesn't rely on any outside libraries:
def short-url(id) string = id.to_s + (rand(8)+1).to_s string.reverse.to_i.to_s(36) end The main problem before was ending up with a number that ending in zero. When it was converted to a string, reversed then converted back into an integer, the zero was lost. I can ensure that a number never ends in zero by appending a random digit from 1-9 on the end of the id string. This produces short urls that are unique and all look very different so should stop people casually looking for different urls. I think it fits the bill, anybody got any thoughts? thanks for all the help and ideas - very interesting stuff, DAZ On Mar 13, 1:14 pm, DAZ <[email protected]> wrote: > Still doesn't work... > > Verhoeff.checksum_of accepts strings that begin with leading zeroes, > but can give the same answer occasionally, for example: > Verhoeff.checksum_of('02') => 23 > Verhoeff.checksum_of('000002') => 23 > > That means my short_url function will give the same url for ids of 7 > and 199,987 both give the same resulting short url ('n') > > :( > > Looks like while reversing the string wont' work as a way of jumbling > up the resulting url. > > Next try is to apply the Verhoeff algorithm 3 times, then convert to a > base 36 number: > > def short_url(id) > > > id = Verhoeff.checksum_of(Verhoeff.checksum_of(Verhoeff.checksum_of(id))) > > id.to_s(36) > > end > > These seems okay ... for now! > > I never thought this would be so hard! > > DAZ > > On Mar 13, 12:37 pm, DAZ <[email protected]> wrote: > > > > > > > > > FAIL! > > > > def short_url(id) > > > id = id + 13 > > > id.to_s.reverse.to_i > > > Verhoeff.checksum_of(n).to_i.to_s(36) > > > end > > > 11 and 227 both map to the same short url (because '240' is reversed > > to make '042'.to_i = 42) > > > I think this can be fixed because the Verhoeff.checksum_of method > > accepts strings starting with 0: > > > > def short_url(id) > > > id = id + 13 > > > id.to_s.reverse > > > Verhoeff.checksum_of(n).to_i.to_s(36) > > > end > > > Hopefully this now does produce unique results ... anybody see any > > problems? > > > cheers, > > > DAZ > > > On Mar 13, 12:06 pm, DAZ <[email protected]> wrote: > > > > Hi Dan, > > > > > You could use the record's id, and then add a checksum digit using the > > > > Luhn or Verhoeff algorithm, and then convert the resulting number to a > > > > base 36 string. > > > > There are three advantages to this approach. > > > > > 1) you don't have to worry about generating a random value and > > > > dealing with collisions since the database handles it for you > > > > 2) you can detect typos and mistakes without having to hit the > > > > database > > > > 3) people won't be able to guess the URLs unless they are familiar > > > > with the exact algo you're using > > > > I've just been playing around with luhnacy and oklasoft-verhoeff gems. > > > > The main problem now seems to be with point 3 - because only the last > > > digit is changing there is very little difference in the resulting > > > strings for bigger integers > > > eg: > > > def short_url(id) > > > Verhoeff.checksum_of(n).to_i.to_s(36) > > > end > > > > short_url(12897) => "2rip" > > > short_url(12897) => "2riv" > > > > To get round this I had a go at adding a number at the beginning then > > > reversing the digits before adding the chechsum digit: > > > > def short_url(id) > > > id = id + 13 > > > id.to_s.reverse.to_i > > > Verhoeff.checksum_of(n).to_i.to_s(36) > > > end > > > > This seems to do the trick: > > > > short_url(12897) => "etp" > > > short_url(12897) => "2jzf" > > > > My only worry now is have I compromised point 1 - are the values still > > > unique? I think they are but will need to have a bit more of a think > > > about the possibilities. > > > > Point 2 is a bonus - being able to check a URL for authenticity before > > > hitting the database to search for it. > > > > So I think this might work ... thanks to everybody for their help and > > > suggestions! > > > > DAZ > > > ps - would I still store this as type UUID, or just a string? > > > > On Mar 13, 12:06 am, "Dan Kubb (dkubb)" <[email protected]> wrote: > > > > > DAZ, > > > > > > I definitely need short strings - 6-8 characters for the url. It is > > > > > for the url of e-cards that people send - they don't have to be secret > > > > > urls, but it would be nice if people couldn't easily guess other urls > > > > > and read other peoples cards, so just using the auto-incrementing id > > > > > isn't really an option :( > > > > > You could use the record's id, and then add a checksum digit using the > > > > Luhn or Verhoeff algorithm, and then convert the resulting number to a > > > > base 36 string. There are libraries to handle the checksum generation > > > > and testing so it would only take a couple of lines of code for both > > > > operations. > > > > > A determined hacker could just brute force things too, I don't see any > > > > way for 100% protection in those cases. The best thing you can hope > > > > for is to discourage casual exploration of the URL space. > > > > > > How likely is rand(36**8).to_s(36) to have a collision compared to > > > > > truncating UUIDTools::UUID.random_create? > > > > > It's probably the same. > > > > > > I realise that with smaller strings the chances of collision are > > > > > larger. How do sites like disqus and bit.ly make their short urls? > > > > > I don't know precisely. I'd guess they do something like above, I > > > > don't see how they could do it any other way at the scales they are > > > > working at. > > > > > -- > > > > > Dan > > > > (dkubb) -- You received this message because you are subscribed to the Google Groups "DataMapper" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/datamapper?hl=en.
