Hello, and I hope you're all having or have had nice holidays, here's a short patch series to speed up common gnulib-tool usage a bit:
1) cache module metainformation. The first observation is that a bulk of the forks in a typical --update are spent for 'sed' parsing the module metainformation files. So let's cache them: contents are parsed into shell variables. The cache variable names consist of 'c_' plus the flattened module name. For Bash, the function to flatten the name uses ${var//subst/repl} to avoid forking for module names that contain non-alphanumeric characters (such as '/'). FWIW, the values of $lookedup_file and $lookedup_tmp are not cached; doing so would, if --local-dir were used and module files patched, require that the patched files be kept (and not overwritten) for the duration of the script. I have checked that no caller site uses the lookedup_{file,tmp} values for the module metainformation files, so we don't have to worry about this. By itself, this patch does not help much but even slows down gnulib-tool (see timings below), because a lot of the module file reading happens in subshells, failing to populate the parent shell's cache. 2) avoid forks with func_get_* functions. This patch turns (1) into a speed boost, by eliminating lots of forks related to calls of the func_get_* functions, thus allowing the cache to be used a few times in a typical --update or --test operation. (Of course the additional fork elimination itself also helps. :-) 3) abort loops early where possible. A couple of loops only test for presence of some condition, but have no other side condition; they can be aborted as soon as we have a definite answer. 4) faster string handling for Posix shells. This introduces a shell function for splitting off literal prefixes and suffixes from strings, avoiding 'sed' when the shell is Posixy enough (idea copied from Libtool). I have tested the changes with M4, using 'gnulib-tool --test', and on a couple of other packages using gnulib, and ensured that the only changes they cause is some harmless removed empty lines in generated files. Testing was done on GNU/Linux using bash and pdksh, and Solaris ksh. The patches are posted using 'git format-patch' so they can be fed directly into 'git am', for those so inclined. OK to apply? The whole series gives me about 50% improvement for gnulib-tool --update on the git M4 tree: before: 21.63 s after 1: 27.46 s after 2: 16.46 s after 3: 12.94 s after 4: 10.83 s With gnulib-tool --with-tests --test there is about 20% improvement (a couple of minutes), but note that this also runs the other autotools, configure and make. (1) and (2) slightly slow down things like gnulib-tool --extract-description ... but since these modes are typically faster than the other modes, I consider that an acceptable trade-off. Otherwise, one could also reorganize gnulib-tool a bit so that it can use one script on all modules in question, like sed "$sed_extract_license_only" $modules Doing that throughout the code (i.e., also for --update) would need more intrusive changes, though. Thanks, Ralf