[ CC += bug-findutils, bug-gnulib; bug-coreutils moved to BCC ] On Thu, Apr 16, 2009 at 8:09 PM, Ondrej Bilka <nel...@seznam.cz> wrote: > Hello. I am writing partial fnmatch to speed up locate et al.
Please note that locate is part of GNU findutils, not GNU coreutils. I have CC'ed this email to the correct list. Regular expression matching with "locate -r" is also faster than the default, globbing, match. > Idea is save state state to match common prefix only once. > > fnmatch source is too complex so I wrote simplified version. > My code is 2-4 times faster than fnmatch. > > Here is list what provided speedup and can be applied to original source > > FOLD causes worst performance slowdown. > From what I tried is best convert in buffer to uppercase when needed. This seems like an attractive option but I'm a little concerned about whether this will produce the correct result with characters like ß or Ï and ï. > Other optimalizations are based on preprocessing pattern > > 1. For each * we compute minimal size of rest and we have smaller for cycles. > 2. We can replace *? by ?* > 3. If * is followed by letter we check it at for cycle of * > 4. If pattern contains four consecutive letters we compare them as int I'm, not certain I have fully understood your points, but if you can speed up gnulib's fnmatch implementation, that would be good news. I looked at preprocessing the glob pattern to convert it into a regular expression, but I think there were some problems around collating symbols and character equivalence classes (i.e. that it''s not possible for the application to extract information about these from the C library). Since I last looked at this issue though, Bruno has implemented a large amount of Unicode support in gnulib, and this may in fact help solve the problem too. I've CC'ed the message to bug-gnulib since that's where the folks who maintain the implementations of that library and the fnmatch replacement hang out. > > I also tried optimalizations which didnt worked. > use mask to match four letters and ? > strlen trick to find first letter after * faster. > Boyer-moore skiping didnt work either. > > -- > > 50% of the manual is in .pdf readme files > > > _______________________________________________ > Bug-coreutils mailing list > bug-coreut...@gnu.org > http://lists.gnu.org/mailman/listinfo/bug-coreutils >