Hello Ralf, Ralf Wildenhues wrote: > BTW, I now found a newer paper that has some more interesting stuff, if > a bit overkill for msgmerge: <http://fastss.csg.uzh.ch/ifi-2007.02.pdf>. > Also, there is this good overview of algorithms: > <ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/survasm.ps.gz>.
Thanks for these pointers. The first one explains how Google's "did you mean" feature works, but is hardly applicable here, because for a string of length 100, msgmerge is looking for similar strings with possible 30 differences (not just 2 or 3 differences). The second one is very interesting, especially the filtering algorithms look fancy - but also hard to put in place here because they are good for ɑ < 0.1, whereas here ɑ can be 0.3. The algorithm that appears to be most useful for our range of parameters is BPM (bit- parallelized Myers). But I wonder whether it can combine with the dynamic optimizations that we have already put in place - namely, that the lower_bound is much higher for the later messages being compared than for the first ones in the list. Bruno