strcache scaling issue

Ralf Wildenhues Sun, 09 Jan 2011 00:04:35 -0800

Hello bug-make readers,

this originated in an interesting thread on the automake list:
http://thread.gmane.org/gmane.comp.sysutils.automake.general/12421/focus=12431
There are more things discussed there, and most actual problems are
solvable in Automake I guess, but there are interesting tidbits for
GNU make as well:


* Xan Lopez wrote on Tue, Jan 04, 2011 at 02:21:01PM CET:
> http://thread.gmane.org/gmane.comp.gnu.make.bugs/4219,

Hmpf.  I'd really like to see GNU make exploit the Linux kernel's
unlimited ARG_MAX.  Both lack of time on my part, and not seeing an easy
fix of the above patch to make it acceptable.  :-/
Can anybody help here?

I looked at sample makefiles and scaling things in there.  This stuff is
obviously made up, and I don't have an actual project to point to where
this is indeed a limiting factor, but IMVHO it is easy to see that it is
possible such projects exist.  Anyway, this shell script snippet shows a
scalability bug:

for max in 1000 2000 4000; do
  l=`seq $max`
  for f in $l; do
    echo "all: some-long-file-name-$f"
    echo "some-long-file-name-$f: some-long-file-header-$f"
    echo "some-long-file-header-$f:"
  done > Makefile
  echo $max
  time make
done

with quadratic behavior:

1000
make: Nothing to be done for `all'.
1.94user 0.02system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 50720maxresident)k
0inputs+0outputs (0major+4866minor)pagefaults 0swaps
2000
make: Nothing to be done for `all'.
6.89user 0.02system 0:06.90elapsed 100%CPU (0avgtext+0avgdata 97936maxresident)k
0inputs+0outputs (0major+9569minor)pagefaults 0swaps
4000
make: Nothing to be done for `all'.
25.45user 0.08system 0:25.53elapsed 99%CPU (0avgtext+0avgdata 
192336maxresident)k
0inputs+0outputs (0major+19068minor)pagefaults 0swaps

Let's see who is responsible: gprof for the 1000 and 4000 run:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 71.55      1.71     1.71   243365     0.00     0.00  add_string
  5.02      1.83     0.12     2002     0.00     0.00  pattern_search
  4.60      1.94     0.11   630714     0.00     0.00  str_hash_1
  2.93      2.01     0.07  2152933     0.00     0.00  hash_find_slot
  2.51      2.07     0.06     9235     0.00     0.00  strcache_iscached

 84.80     26.05    26.05   960367     0.00     0.00  add_string
  3.22     27.04     0.99  8603069     0.00     0.00  hash_find_slot
  2.77     27.89     0.85    36235     0.00     0.00  strcache_iscached
  2.21     28.57     0.68     8002     0.00     0.00  pattern_search
  1.50     29.03     0.46  2532000     0.00     0.00  str_hash_1

So add_string, while called only a linear number of times, has a
quadratic loop.  The hash seems to degrade too, with only fourfold more
calls it requires 8.25 times more time.  But our example is known-ugly
for a cheap hash function, so I wouldn't worry too much about that.

strcache_iscached is more worrisome with 14-fold increase.  Looking at
its implementation, I wonder why it doesn't do a hash lookup rather than
looping over all the cache entries.

gcov output is pretty clear:

        -:   63:static const char *
   243365:   64:add_string(const char *str, int len)
        -:   65:{
   243365:   66:  struct strcache *best = NULL;
        -:   67:  struct strcache *sp;
        -:   68:  const char *res;
        -:   69:
        -:   70:  /* If the string we want is too large to fit into a single 
buffer, then
        -:   71:     we're screwed; nothing will ever fit!  Change the maximum 
size of the
        -:   72:     cache to be big enough.  */
   243365:   73:  if (len > bufsize)
    #####:   74:    bufsize = len * 2;
        -:   75:
        -:   76:  /* First, find a cache with enough free space.  We always 
look through all
        -:   77:     the blocks and choose the one with the best fit (the one 
that leaves the
        -:   78:     least amount of space free).  */
110775987:   79:  for (sp = strcache; sp != NULL; sp = sp->next)
110532622:   80:    if (sp->bytesfree > len && (!best || best->bytesfree > 
sp->bytesfree))
   242651:   81:      best = sp;
        -:   82:
        -:   83:  /* If nothing is big enough, make a new cache.  */
   243365:   84:  if (!best)
      914:   85:    best = new_cache();
        -:   86:
   243365:   87:  assert (best->bytesfree > len);
        -:   88:
        -:   89:  /* Add the string to the best cache.  */
   243365:   90:  res = best->end;
   243365:   91:  memcpy (best->end, str, len);
   243365:   92:  best->end += len;
   243365:   93:  *(best->end++) = '\0';
   243365:   94:  best->bytesfree -= len + 1;
   243365:   95:  ++best->count;
        -:   96:
   243365:   97:  return res;
        -:   98:}


Just breaking out of the for loop in the if-true case avoids the problem
for me.  (The bulk of the patch is cleanup to remove the unneeded 'best'
variable.)  Of course that only delays quadratic behavior as with filled
caches, all of them are still walked.

If so, then I recommend implementing a priority queue for the
set of free bytes, or use a proper allocator scheme.  Generally, this is
all still wrong though.  make shouldn't spend 18s in its string handling
code for a <2MB Makefile specifying 16K files (8000 in above shell
script) on a modern system.  I'd expect 1-2s at most, and the rest in
stat (this is with the patch below applied):

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 32.14      8.46     8.46  3840369     0.00     0.00  add_string
 16.34     12.76     4.30 34547800     0.00     0.00  hash_find_slot
 11.15     15.70     2.94    32002     0.00     0.00  pattern_search
  6.78     17.48     1.79 10281361     0.00     0.00  str_hash_1
  5.13     18.83     1.35  8155410     0.00     0.00  dirfile_hash_1
  4.94     20.13     1.30    48000     0.00     0.00  record_files
  4.92     21.43     1.30 25410516     0.00     0.00  dirfile_hash_cmp
  3.53     22.36     0.93  6277494     0.00     0.00  str_hash_2
  3.21     23.20     0.85  5102123     0.00     0.00  file_hash_1
  2.26     23.80     0.60  5403326     0.00     0.00  dirfile_hash_2
  1.82     24.28     0.48  5238695     0.00     0.00  file_hash_cmp
  1.44     24.66     0.38  2517085     0.00     0.00  file_hash_2
  1.25     24.99     0.33 11008694     0.00     0.00  directory_hash_1
  0.46     25.11     0.12  5472342     0.00     0.00  file_impossible_p
  0.42     25.22     0.11  4928311     0.00     0.00  file_exists_p
  0.40     25.32     0.11 34208095     0.00     0.00  str_hash_cmp
  0.38     25.42     0.10   448046     0.00     0.00  find_char_unquote
  0.38     25.52     0.10       28     0.00     0.07  hash_rehash
  0.30     25.60     0.08 11666204     0.00     0.00  hash_find_item
  0.27     25.67     0.07  4484685     0.00     0.00  hash_insert_at
  0.23     25.73     0.06  4961544     0.00     0.00  lookup_file
  0.21     25.79     0.06 11008691     0.00     0.00  directory_hash_cmp
  0.19     25.84     0.05  6212628     0.00     0.00  add_hash

If the expected average string length is bounded away from
CACHE_BUFFER_SIZE (which I'm sure it is), then it would even make sense
to not ever look at any but the first strcache in the list at all; the
loss would still be bounded away from 50%.  Memory is cheap these days,
and I don't know of projects that parse GB worth of non-repeating
makefile text.  (I have not done this step in the patch below, feel free
to add it though.)

Cheers,
Ralf

2011-01-09  Ralf Wildenhues  <ralf.wildenh...@gmx.de>

        * strcache.c (add_string): Use first fitting cache, to
        avoid quadratic behavior searching for an optimal-size cache.
        (strcache_iscached): Use the hash instead of walking the string
        cache.
        Prompted by a report against Automake by Xan Lopez
        <x...@gnome.org>.

Index: strcache.c
===================================================================
RCS file: /cvsroot/make/make/strcache.c,v
retrieving revision 2.9
diff -u -p -r2.9 strcache.c
--- strcache.c  13 Jul 2010 01:20:43 -0000      2.9
+++ strcache.c  9 Jan 2011 07:41:44 -0000
@@ -63,7 +63,6 @@ new_cache()
 static const char *
 add_string(const char *str, int len)
 {
-  struct strcache *best = NULL;
   struct strcache *sp;
   const char *res;
 
@@ -73,26 +72,24 @@ add_string(const char *str, int len)
   if (len > bufsize)
     bufsize = len * 2;
 
-  /* First, find a cache with enough free space.  We always look through all
-     the blocks and choose the one with the best fit (the one that leaves the
-     least amount of space free).  */
+  /* First, find a cache with enough free space.  */
   for (sp = strcache; sp != NULL; sp = sp->next)
-    if (sp->bytesfree > len && (!best || best->bytesfree > sp->bytesfree))
-      best = sp;
+    if (sp->bytesfree > len)
+      break;
 
   /* If nothing is big enough, make a new cache.  */
-  if (!best)
-    best = new_cache();
+  if (!sp)
+    sp = new_cache();
 
-  assert (best->bytesfree > len);
+  assert (sp->bytesfree > len);
 
   /* Add the string to the best cache.  */
-  res = best->end;
-  memcpy (best->end, str, len);
-  best->end += len;
-  *(best->end++) = '\0';
-  best->bytesfree -= len + 1;
-  ++best->count;
+  res = sp->end;
+  memcpy (sp->end, str, len);
+  sp->end += len;
+  *(sp->end++) = '\0';
+  sp->bytesfree -= len + 1;
+  ++sp->count;
 
   return res;
 }
@@ -144,13 +141,7 @@ add_hash (const char *str, int len)
 int
 strcache_iscached (const char *str)
 {
-  struct strcache *sp;
-
-  for (sp = strcache; sp != 0; sp = sp->next)
-    if (str >= sp->buffer && str < sp->end)
-      return 1;
-
-  return 0;
+  return hash_find_item (&strings, str) != NULL;
 }
 
 /* If the string is already in the cache, return a pointer to the cached

_______________________________________________
Bug-make mailing list
Bug-make@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-make

strcache scaling issue

Reply via email to