Re: faster fnmatch

Bruno Haible Fri, 17 Apr 2009 04:03:47 -0700

Hello Ondrej,

> > Hello. I am writing partial fnmatch to speed up locate et al.


Cool! We know for some time already that this is a bottleneck [1].
I find it also interesting that you go for a two-step approach,
preprocess the pattern once and use it for matching often - the same
approach that we considered useful in [2].

> > FOLD causes worst performance slowdown.
> > From what I tried is best convert in buffer to uppercase when needed.
> 
> This seems like an attractive option but I'm a little concerned about
> whether this will produce the correct result with characters like ß or
> Ï and ï.
> ... Bruno has implemented a
> large amount of Unicode support in gnulib, and this may in fact help

Yes, implementing a case-insensitive matching by doing an uppercasing
pass on both sides is an old technique that ignores the properties of
a couple of languages. For good support of all languages, it's necessary
to use a "casefolding" pass, such as the one described by the Unicode
standard and implemented in gnulib (and soon, libunistring), in
"unicase.h". In particular the function ulc_u8_casefold from
lib/unicase/ulc-casecmp.c looks like what's needed as preprocessing pass
for an arbitrary file name in locale encoding.

Note that this Unicode aware casefolding provides correctness; it will,
however, likely decrease the speed of the matching in 'find' (whereas in
'locate', the file name list can be processed ahead of time).

Bruno


[1] http://lists.gnu.org/archive/html/bug-gnulib/2007-12/msg00040.html
[2] http://lists.gnu.org/archive/html/bug-gnulib/2007-12/msg00071.html

Re: faster fnmatch

Reply via email to