Russ Allbery wrote: [...] > > Have you benchmarked this? My intuition says that if this makes any > difference at all, it will be a performance *degredation*. You're now > walking the entire text for every typo we know about instead of doing an > O(1) hash table lookup for each word. It's converting an O(n) check into > an O(n^2) check. >
There might indeed be a degradation on those texts where the number of characters is near the limit of the methods switcher. Other than those cases, when the number of words in the text is greater than the number of words in the hash there should be a performance improvement when searching for every known word in the text (there is indeed a performance penalty because of the regex and the text traversal, but it is not enough to make it slower than the other method). Without the patch: $ time xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb [...] real 0m8.380s user 0m5.596s sys 0m0.640s (this is normal as the file is being read directly from the disk, not in memory) $ time xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb [...] real 0m6.701s user 0m5.800s sys 0m0.684s $ time xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb [...] real 0m6.424s user 0m5.740s sys 0m0.520s This time with the patch: $ time xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb [...] real 0m6.613s user 0m5.728s sys 0m0.580s (Let's ignore this one, so that disk reads do not influence the results) $ time xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb [...] real 0m6.254s user 0m5.564s sys 0m0.540s $ time xlintian /var/cache/apt/archives/php5-cli_5.2.6.dfsg.1-1+lenny2_i386.deb [...] real 0m6.311s user 0m5.456s sys 0m0.600s (xlintian is an alias to lintian from my git repo). As you can see with the patch it is faster. Cheers, -- Raphael Geissert - Debian Maintainer www.debian.org - get.debian.net -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org