Re: faster fnmatch

James Youngman Fri, 17 Apr 2009 02:52:09 -0700

[ CC += bug-findutils, bug-gnulib; bug-coreutils moved to BCC ]

On Thu, Apr 16, 2009 at 8:09 PM, Ondrej Bilka <nel...@seznam.cz> wrote:
> Hello. I am writing partial fnmatch to speed up locate et al.


Please note that locate is part of GNU findutils, not GNU coreutils.
I have CC'ed this email to the correct list.

Regular expression matching with "locate -r" is also faster than the
default, globbing, match.

> Idea is save state state to match common prefix only once.
>
> fnmatch source is too complex so I wrote simplified version.
> My code is 2-4 times faster than fnmatch.
>
> Here is list what provided speedup and can be applied to original source
>
> FOLD causes worst performance slowdown.
> From what I tried is best convert in buffer to uppercase when needed.

This seems like an attractive option but I'm a little concerned about
whether this will produce the correct result with characters like ß or
Ï and ï.

> Other optimalizations are based on preprocessing pattern
>
> 1. For each * we compute minimal size of rest and we have smaller for cycles.
> 2. We can replace *? by ?*
> 3. If * is followed by letter we check it at for cycle of *
> 4. If pattern contains four consecutive letters we compare them as int

I'm, not certain I have fully understood your points, but if you can
speed up gnulib's fnmatch implementation, that would be good news.   I
looked at preprocessing the glob pattern to convert it into a regular
expression, but I think there were some problems around collating
symbols and character equivalence classes (i.e. that it''s not
possible for the application to extract information about these from
the C library).

Since I last looked at this issue though, Bruno has implemented a
large amount of Unicode support in gnulib, and this may in fact help
solve the problem too.    I've CC'ed the message to bug-gnulib since
that's where the folks who maintain the implementations of that
library and the fnmatch replacement hang out.


>
> I also tried optimalizations which didnt worked.
> use mask to match four letters and ?
> strlen trick to find first letter after * faster.
> Boyer-moore skiping didnt work either.
>
> --
>
> 50% of the manual is in .pdf readme files
>
>
> _______________________________________________
> Bug-coreutils mailing list
> bug-coreut...@gnu.org
> http://lists.gnu.org/mailman/listinfo/bug-coreutils
>

Re: faster fnmatch

Reply via email to