Hello bug-make readers, this originated in an interesting thread on the automake list: http://thread.gmane.org/gmane.comp.sysutils.automake.general/12421/focus=12431 There are more things discussed there, and most actual problems are solvable in Automake I guess, but there are interesting tidbits for GNU make as well:
* Xan Lopez wrote on Tue, Jan 04, 2011 at 02:21:01PM CET: > http://thread.gmane.org/gmane.comp.gnu.make.bugs/4219, Hmpf. I'd really like to see GNU make exploit the Linux kernel's unlimited ARG_MAX. Both lack of time on my part, and not seeing an easy fix of the above patch to make it acceptable. :-/ Can anybody help here? I looked at sample makefiles and scaling things in there. This stuff is obviously made up, and I don't have an actual project to point to where this is indeed a limiting factor, but IMVHO it is easy to see that it is possible such projects exist. Anyway, this shell script snippet shows a scalability bug: for max in 1000 2000 4000; do l=`seq $max` for f in $l; do echo "all: some-long-file-name-$f" echo "some-long-file-name-$f: some-long-file-header-$f" echo "some-long-file-header-$f:" done > Makefile echo $max time make done with quadratic behavior: 1000 make: Nothing to be done for `all'. 1.94user 0.02system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 50720maxresident)k 0inputs+0outputs (0major+4866minor)pagefaults 0swaps 2000 make: Nothing to be done for `all'. 6.89user 0.02system 0:06.90elapsed 100%CPU (0avgtext+0avgdata 97936maxresident)k 0inputs+0outputs (0major+9569minor)pagefaults 0swaps 4000 make: Nothing to be done for `all'. 25.45user 0.08system 0:25.53elapsed 99%CPU (0avgtext+0avgdata 192336maxresident)k 0inputs+0outputs (0major+19068minor)pagefaults 0swaps Let's see who is responsible: gprof for the 1000 and 4000 run: % cumulative self self total time seconds seconds calls s/call s/call name 71.55 1.71 1.71 243365 0.00 0.00 add_string 5.02 1.83 0.12 2002 0.00 0.00 pattern_search 4.60 1.94 0.11 630714 0.00 0.00 str_hash_1 2.93 2.01 0.07 2152933 0.00 0.00 hash_find_slot 2.51 2.07 0.06 9235 0.00 0.00 strcache_iscached 84.80 26.05 26.05 960367 0.00 0.00 add_string 3.22 27.04 0.99 8603069 0.00 0.00 hash_find_slot 2.77 27.89 0.85 36235 0.00 0.00 strcache_iscached 2.21 28.57 0.68 8002 0.00 0.00 pattern_search 1.50 29.03 0.46 2532000 0.00 0.00 str_hash_1 So add_string, while called only a linear number of times, has a quadratic loop. The hash seems to degrade too, with only fourfold more calls it requires 8.25 times more time. But our example is known-ugly for a cheap hash function, so I wouldn't worry too much about that. strcache_iscached is more worrisome with 14-fold increase. Looking at its implementation, I wonder why it doesn't do a hash lookup rather than looping over all the cache entries. gcov output is pretty clear: -: 63:static const char * 243365: 64:add_string(const char *str, int len) -: 65:{ 243365: 66: struct strcache *best = NULL; -: 67: struct strcache *sp; -: 68: const char *res; -: 69: -: 70: /* If the string we want is too large to fit into a single buffer, then -: 71: we're screwed; nothing will ever fit! Change the maximum size of the -: 72: cache to be big enough. */ 243365: 73: if (len > bufsize) #####: 74: bufsize = len * 2; -: 75: -: 76: /* First, find a cache with enough free space. We always look through all -: 77: the blocks and choose the one with the best fit (the one that leaves the -: 78: least amount of space free). */ 110775987: 79: for (sp = strcache; sp != NULL; sp = sp->next) 110532622: 80: if (sp->bytesfree > len && (!best || best->bytesfree > sp->bytesfree)) 242651: 81: best = sp; -: 82: -: 83: /* If nothing is big enough, make a new cache. */ 243365: 84: if (!best) 914: 85: best = new_cache(); -: 86: 243365: 87: assert (best->bytesfree > len); -: 88: -: 89: /* Add the string to the best cache. */ 243365: 90: res = best->end; 243365: 91: memcpy (best->end, str, len); 243365: 92: best->end += len; 243365: 93: *(best->end++) = '\0'; 243365: 94: best->bytesfree -= len + 1; 243365: 95: ++best->count; -: 96: 243365: 97: return res; -: 98:} Just breaking out of the for loop in the if-true case avoids the problem for me. (The bulk of the patch is cleanup to remove the unneeded 'best' variable.) Of course that only delays quadratic behavior as with filled caches, all of them are still walked. If so, then I recommend implementing a priority queue for the set of free bytes, or use a proper allocator scheme. Generally, this is all still wrong though. make shouldn't spend 18s in its string handling code for a <2MB Makefile specifying 16K files (8000 in above shell script) on a modern system. I'd expect 1-2s at most, and the rest in stat (this is with the patch below applied): % cumulative self self total time seconds seconds calls s/call s/call name 32.14 8.46 8.46 3840369 0.00 0.00 add_string 16.34 12.76 4.30 34547800 0.00 0.00 hash_find_slot 11.15 15.70 2.94 32002 0.00 0.00 pattern_search 6.78 17.48 1.79 10281361 0.00 0.00 str_hash_1 5.13 18.83 1.35 8155410 0.00 0.00 dirfile_hash_1 4.94 20.13 1.30 48000 0.00 0.00 record_files 4.92 21.43 1.30 25410516 0.00 0.00 dirfile_hash_cmp 3.53 22.36 0.93 6277494 0.00 0.00 str_hash_2 3.21 23.20 0.85 5102123 0.00 0.00 file_hash_1 2.26 23.80 0.60 5403326 0.00 0.00 dirfile_hash_2 1.82 24.28 0.48 5238695 0.00 0.00 file_hash_cmp 1.44 24.66 0.38 2517085 0.00 0.00 file_hash_2 1.25 24.99 0.33 11008694 0.00 0.00 directory_hash_1 0.46 25.11 0.12 5472342 0.00 0.00 file_impossible_p 0.42 25.22 0.11 4928311 0.00 0.00 file_exists_p 0.40 25.32 0.11 34208095 0.00 0.00 str_hash_cmp 0.38 25.42 0.10 448046 0.00 0.00 find_char_unquote 0.38 25.52 0.10 28 0.00 0.07 hash_rehash 0.30 25.60 0.08 11666204 0.00 0.00 hash_find_item 0.27 25.67 0.07 4484685 0.00 0.00 hash_insert_at 0.23 25.73 0.06 4961544 0.00 0.00 lookup_file 0.21 25.79 0.06 11008691 0.00 0.00 directory_hash_cmp 0.19 25.84 0.05 6212628 0.00 0.00 add_hash If the expected average string length is bounded away from CACHE_BUFFER_SIZE (which I'm sure it is), then it would even make sense to not ever look at any but the first strcache in the list at all; the loss would still be bounded away from 50%. Memory is cheap these days, and I don't know of projects that parse GB worth of non-repeating makefile text. (I have not done this step in the patch below, feel free to add it though.) Cheers, Ralf 2011-01-09 Ralf Wildenhues <ralf.wildenh...@gmx.de> * strcache.c (add_string): Use first fitting cache, to avoid quadratic behavior searching for an optimal-size cache. (strcache_iscached): Use the hash instead of walking the string cache. Prompted by a report against Automake by Xan Lopez <x...@gnome.org>. Index: strcache.c =================================================================== RCS file: /cvsroot/make/make/strcache.c,v retrieving revision 2.9 diff -u -p -r2.9 strcache.c --- strcache.c 13 Jul 2010 01:20:43 -0000 2.9 +++ strcache.c 9 Jan 2011 07:41:44 -0000 @@ -63,7 +63,6 @@ new_cache() static const char * add_string(const char *str, int len) { - struct strcache *best = NULL; struct strcache *sp; const char *res; @@ -73,26 +72,24 @@ add_string(const char *str, int len) if (len > bufsize) bufsize = len * 2; - /* First, find a cache with enough free space. We always look through all - the blocks and choose the one with the best fit (the one that leaves the - least amount of space free). */ + /* First, find a cache with enough free space. */ for (sp = strcache; sp != NULL; sp = sp->next) - if (sp->bytesfree > len && (!best || best->bytesfree > sp->bytesfree)) - best = sp; + if (sp->bytesfree > len) + break; /* If nothing is big enough, make a new cache. */ - if (!best) - best = new_cache(); + if (!sp) + sp = new_cache(); - assert (best->bytesfree > len); + assert (sp->bytesfree > len); /* Add the string to the best cache. */ - res = best->end; - memcpy (best->end, str, len); - best->end += len; - *(best->end++) = '\0'; - best->bytesfree -= len + 1; - ++best->count; + res = sp->end; + memcpy (sp->end, str, len); + sp->end += len; + *(sp->end++) = '\0'; + sp->bytesfree -= len + 1; + ++sp->count; return res; } @@ -144,13 +141,7 @@ add_hash (const char *str, int len) int strcache_iscached (const char *str) { - struct strcache *sp; - - for (sp = strcache; sp != 0; sp = sp->next) - if (str >= sp->buffer && str < sp->end) - return 1; - - return 0; + return hash_find_item (&strings, str) != NULL; } /* If the string is already in the cache, return a pointer to the cached _______________________________________________ Bug-make mailing list Bug-make@gnu.org http://lists.gnu.org/mailman/listinfo/bug-make