Hello Ondrej, > > Hello. I am writing partial fnmatch to speed up locate et al.
Cool! We know for some time already that this is a bottleneck [1]. I find it also interesting that you go for a two-step approach, preprocess the pattern once and use it for matching often - the same approach that we considered useful in [2]. > > FOLD causes worst performance slowdown. > > From what I tried is best convert in buffer to uppercase when needed. > > This seems like an attractive option but I'm a little concerned about > whether this will produce the correct result with characters like ß or > Ï and ï. > ... Bruno has implemented a > large amount of Unicode support in gnulib, and this may in fact help Yes, implementing a case-insensitive matching by doing an uppercasing pass on both sides is an old technique that ignores the properties of a couple of languages. For good support of all languages, it's necessary to use a "casefolding" pass, such as the one described by the Unicode standard and implemented in gnulib (and soon, libunistring), in "unicase.h". In particular the function ulc_u8_casefold from lib/unicase/ulc-casecmp.c looks like what's needed as preprocessing pass for an arbitrary file name in locale encoding. Note that this Unicode aware casefolding provides correctness; it will, however, likely decrease the speed of the matching in 'find' (whereas in 'locate', the file name list can be processed ahead of time). Bruno [1] http://lists.gnu.org/archive/html/bug-gnulib/2007-12/msg00040.html [2] http://lists.gnu.org/archive/html/bug-gnulib/2007-12/msg00071.html