i've just done some simple tests and found the following:

1. on one of my systems (a laptop with 128MB RAM), dlocatedb takes up
   696KB of disk space. a plain text dump of it takes up 3.0MB

   text dump generated with 'dlocate / > dlocate.txt'

   i then made sure that both dlocatedb and dlocate.txt were not
   in the disk cache by catting approx 150MB of files to /dev/null.



# ls -lh dlocate.txt dlocatedb
-rw-r--r-- 1 root root 3.0M 2009-05-30 11:14 dlocate.txt
-rw-r--r-- 1 root root 696K 2009-05-30 06:29 dlocatedb

# wc -l dlocate.txt
62090 dlocate.txt


2. searching the dlocatedb with locate for a single file takes 1.091
   seconds. grepping for the same file in the text dump takes 0.584
   seconds.

the filename "usr/share/doc/apache2.2-bin/changelog.gz" was chosen because
it is the very last line in dlocate.txt

# time dlocate usr/share/doc/apache2.2-bin/changelog.gz
apache2.2-bin: /usr/share/doc/apache2.2-bin/changelog.gz

real    0m1.091s
user    0m0.484s
sys     0m0.044s

# time grep usr/share/doc/apache2.2-bin/changelog.gz dlocate.txt
apache2.2-bin: /usr/share/doc/apache2.2-bin/changelog.gz

real    0m0.584s
user    0m0.008s
sys     0m0.020s


3. repeating the test immediately with both files cached in RAM gives
   0.512 seconds (dlocate) and 0.034s (grep)

# time dlocate usr/share/doc/apache2.2-bin/changelog.gz
apache2.2-bin: /usr/share/doc/apache2.2-bin/changelog.gz

real    0m0.512s
user    0m0.476s
sys     0m0.032s

# time grep usr/share/doc/apache2.2-bin/changelog.gz dlocate.txt
apache2.2-bin: /usr/share/doc/apache2.2-bin/changelog.gz

real    0m0.034s
user    0m0.012s
sys     0m0.024s



on the first run, grep is twice as fast as dlocate. on subsequent runs,
it is about 15 times faster.

there appears to be no advantage whatsoever to using frcode any more (in
fact, locate is much slower than plain grep), and disk space is so cheap
that the difference between 700KB and 3MB is irrelevant.

accordingly the solution to this on-going dlocate/locate/mlocate
confusion will be the release of a new version of dlocate that doesn't
use or depend on frcode or locate, but instead just uses a plain text
file and grep.

i have a few other things on my TODO list for dlocate.  I'll get them
done and release a new version. hopefully this weekend if real life
doesn't intrude.

i think i'll also add a few more options to dlocate to take advantage of
GNU grep's ability to use different Matchers - from grep(1):

   Matcher Selection
       -E, --extended-regexp
              Interpret PATTERN as an extended regular expression (ERE,
              see below).  (-E is specified by POSIX.)

       -F, --fixed-strings
              Interpret PATTERN as a list of fixed strings, separated by
              newlines, any of which is to be matched.  (-F is specified
              by POSIX.)

       -G, --basic-regexp
              Interpret PATTERN as a basic regular expression (BRE, see
              below).  This is the default.

       -P, --perl-regexp
              Interpret PATTERN as a Perl regular expression.  This is
              highly experimental and grep -P may warn of unimplemented
              features.

and i'll support -w too:

       -w, --word-regexp
              Select only those lines containing matches that form whole
              words.  The test is that the matching substring must
              either be at the beginning of the line, or preceded by a
              non-word constituent character.  Similarly, it must be
              either at the end of the line or followed by a non-word
              constituent character.  Word-constituent characters are
              letters, digits, and the underscore.


this will change the way that dlocate works (in that it does a regexp search
rather than a plain text search) but, IMO, that's far more useful.  GNU locate
has an option to do a regexp search but the timing comparison gets even more
in favour of grep:

# time locate.findutils -d /var/lib/dlocate/dlocatedb -r 
usr/share/doc/apache2.2-bin/changelog.gz
apache2.2-bin: /usr/share/doc/apache2.2-bin/changelog.gz

real    0m1.796s
user    0m1.640s
sys     0m0.012s

1.796 seconds for the first run after flushing disk cache, compared to
0.512 seconds for grep. grep is over 3 times faster.

on subsequent runs, the regexp locate still takes over 1.6 seconds,
while grep takes 0.034 seconds. over 47 times faster. obviously, and not
at all surprisingly, grepping an frcode database is not a very efficient
operation.

# time locate.findutils -d /var/lib/dlocate/dlocatedb -r 
usr/share/doc/apache2.2-bin/changelog.gz
apache2.2-bin: /usr/share/doc/apache2.2-bin/changelog.gz

real    0m1.640s
user    0m1.628s
sys     0m0.008s

craig

-- 
craig sanders <c...@taz.net.au>

BOFH excuse #319:

Your computer hasn't been returning all the bits it gets from the Internet.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to