Op Mon, 30 May 2005 09:47:34 +0100 schreef James Youngman in <[EMAIL PROTECTED]>: : On Mon, May 30, 2005 at 04:31:52AM +0200, Buzz wrote: : : > * I really tried to get this patch as small as possible. : > * The case-folding /can/ be done less often. : : Yes. The old code used to do the case-folding on only the new : characters read out of the database (that is, each filename character : read from the database would be case-folded at most once). However, : when I refactored the code to use the Visitor pattern, I sacrificed : that optimisation.
That is not what I meant. I meant currently there are multiple copies of the case-folded string kept. (one per pattern). I include a patch for this issue. : If you can actually measure the performance impact of the current : slightly stupid implementation of visit_substring_match_casefold(), : I'll certainly consider applying an optimisation patch. The following patch does not address this. Testing for it might be difficult and time-consuming, considering the --basename option. : > * These changes prepare a path towards a ``--and'' (``-a'') option. : : Are you envisaging this option as a single-use option (indicating that : all patterns must match) or as a conjunction as used in "find"? For : example are you suggesting... : : locate foo -a bar -o baz The former. Example: locate -iaE foo bar 'b?z' : I suppose you can already achieve something similar with : : ( locate 'foo.*bar' 'bar.*foo' ; locate baz ) | sort -u The above would be harder to code, and definitely slower... : What does the rest of the list think about "-a"? : : > * It is possible to allow reading a db from stdin. : > * Much of the ``main'' loop can be moved into ``visitor''s. : : If you're talking about refactoring locate(), then yes, it is a bit : too long. I'm not sure the current "Visitor" pattern is appropriate : for this though. I am. The "Visistor" pattern seems fine to me. (Except maybe the {munged,original}_filename might be entered into a structure, providing space for more parameters. (This should improve speed for fastcall-compiled code)) Now for the casefolding-patch: Suggested ChangeLog entry: 2005-05-30 Bas van Gompel <[EMAIL PROTECTED]> * locate/locate.c: Fold case once, only when needed. diff --exclude='*~' -Ndrup curr/findutils/locate/locate.c mine2/findutils/locate/locate.c --- findutils/locate/locate.c 2005-05-30 15:13:34.000000000 +0200 +++ findutils/locate/locate.c 2005-05-30 20:08:10.000000000 +0200 @@ -266,11 +266,20 @@ struct locate_stats static struct locate_stats statistics; -struct casefolder +struct stringbuf { - const char *pattern; char *buffer; size_t buffersize; + size_t *soffs; + size_t *preqlen; +}; +static struct stringbuf casebuf; + + +struct casefolder +{ + const char *pattern; + struct stringbuf *pbuf; }; struct regular_expression @@ -381,6 +390,24 @@ visit_justprint(const char *munged_filen return VISIT_CONTINUE; } +static int +visit_casefold(const char *munged_filename, + const char *original_filename, + void *context) +{ + struct stringbuf *b = context; + (void) original_filename; + + if (*b->preqlen+1 > b->buffersize) + { + b->buffer = xrealloc(b->buffer, *b->preqlen+1); /* XXX: consider using extendbuf(). */ + b->buffersize = *b->preqlen+1; + } + lc_strcpy(b->buffer, munged_filename); + + return VISIT_CONTINUE; +} + /* visit_existing_follow implements -L -e */ static int visit_existing_follow(const char *munged_filename, @@ -500,19 +527,12 @@ visit_substring_match_casefold(const cha const char *original_filename, void *context) { - struct casefolder * p = context; - size_t len = strlen(munged_filename); - + const struct casefolder * p = context; + const struct stringbuf * b = p->pbuf; + (void) munged_filename; (void) original_filename; - if (len+1 > p->buffersize) - { - p->buffer = xrealloc(p->buffer, len+1); /* XXX: consider using extendbuf(). */ - p->buffersize = len+1; - } - lc_strcpy(p->buffer, munged_filename); - - - if (NULL != strstr(p->buffer, p->pattern)) + + if (NULL != strstr(b->buffer, p->pattern)) return VISIT_ACCEPTED; else return VISIT_REJECTED; @@ -656,6 +676,7 @@ locate (int argc, { char *pathpart; /* A pattern to consider. */ int argn; /* Index to current pattern in argv. */ + int need_fold; /* Set when folding and any pattern is non-glob. */ FILE *fp; /* The pathname database. */ int c; /* An input byte. */ int nread; /* number of bytes read from an entry. */ @@ -681,6 +702,25 @@ locate (int argc, past_pat_inspector = NULL; + /* See if we need fold. */ + if (ignore_case && !regex) + for ( argn = 0; argn < argc; argn++ ) + { + pathpart = argv[argn]; + if (!contains_metacharacter(pathpart)) + { + need_fold = 1; + break; + } + } + + if (need_fold) + { + add_visitor(visit_casefold, &casebuf); + casebuf.preqlen = &pathsize; + casebuf.soffs = &count; + } + /* Add an inspector for each pattern we're looking for. */ for ( argn = 0; argn < argc; argn++ ) { @@ -719,8 +759,7 @@ locate (int argc, { struct casefolder * cf = xmalloc(sizeof(*cf)); cf->pattern = pathpart; - cf->buffer = NULL; - cf->buffersize = 0; + cf->pbuf = &casebuf; add_visitor(visit_substring_match_casefold, cf); /* If we ignore case, convert it to lower now so we don't have to * do it every time L8r, Buzz. -- ) | | ---/ ---/ Yes, this | This message consists of true | I do not -- | | / / really is | and false bits entirely. | mail for ) | | / / a 72 by 4 +-------------------------------+ any1 but -- \--| /--- /--- .sigfile. | |perl -pe "s.u(z)\1.as." | me. 4^re _______________________________________________ Bug-findutils mailing list Bug-findutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-findutils