On Mon, 10 Oct 2016 17:41:03 +0200 Bruno Haible <br...@clisp.org> wrote:
> Hi Norihiro, > > I'm not maintainer of the 'dfa' module, but nevertheless I'd like to know > what is going on more precisely (because clearing a cache more often > deteriorates running times than improve running times). 3 questions: > > 1) > > This patch checks number of states in beginning of dfa execution, and if > > there are a lot of states, release them. > > Have you measured what are the execution times without/before and with/after > your patch, for example in the "seq -f '%g bottles of beer on the wall' 2400" > example that you showed? Can you show us the benchmark results? Hi Bruno, I tested on following machine, and I looked at slite performance improvement. Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz gcc version 4.4.7 (GCC) Linux rhel6 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux - before $ time -p env LC_ALL=C grep -vf 600 600 real 3.95 user 3.35 sys 0.59 - after $ time -p env LC_ALL=C grep -vf 600 600 real 3.75 user 2.81 sys 0.70 However, focus saving memory please. See also below. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22357 grep -f huge memory usage > 2) Your patch cuts down on three different caches: > - d->states, > - d->trans, d->fails, > - d->mb_trans. > Can you prepare a separate patch for each of the three changes, and > present the benchmark results of each? It would be interesting to know > which of the three has the most effect on the timings. No, we can not clear D->state without clearing D->trans, D->fails and D->mb_trans, as state numbers are refered in them. BTW, dfa had mechanism to clear caches for D->trans, D->fails and D->mb_trans. See build_state() and transit_state(). > 3) If you are right that reducing the size of a cache improves the timings, > then it means the lookup in the cache is not O(1)? In other words, can the > lookup in the cache be optimized (through more adapted data structures - > I'm thinking of binary trees or hash tables), so that a larger cache size > will NOT lead to a longer execution time? To achieve it, we need a lot of changes. So I am considering it as next step. Thanks, Norihiro