Hello Ralf, Thank you for your speedups to gnulib-tool. At first I was, of course, excited about the 2x speedup. But when looking at the maintainability of the code that you propose, I'm not fine with all of it any more.
My four objections are: 1) You observe that forking programs in a shell script is slow, and therefore propose to use more shell built-ins. The problem with it is that I chose to implement gnulib-tool in shell (for the control structure) and sed (for the text processing). The shell also has some inferior commands for string processing, but sed is vastly superior for this purpose. I want to stick with 'sed' for the text processing, otherwise we write some parts of the code to use shell built-ins, and when we notice that a little more text processing is required, we have to rewrite the code to use 'sed'. So the use of shell built-ins for text processing turns out to be a "premature optimization" (in the sense of Knuth) and hampers maintainability. If you want to achieve good speedups for scripts that use 'sed': can you work towards making 'sed' a bash built-in? This is challenging, but if you are after performance, that would be promising. 2) Your patches change the generation of code so that it goes through intermediate shell variables. The problem with this is that the transformation from string to standard output is not simple: echo $string outputs the string plus a newline, and 'echo -n' is not portable. 3) The sed expression sed_cache_module in part 1 of your patch is not maintainable. The sed_extract_prog was already complex, but what you made of it is beyond what is acceptable in code that should be maintained 5 and 10 years from now. 4) There is too much 'eval' in the code. As you have seen in an earlier patch today, every use of 'eval' can bring a security problem. The only uses of 'eval' that are always safe are variable assignments eval "$var=\$value" when you can guarantee that 'var' is a simple identifier. And, last not least, more comments would have been better. So, globally, when you try to cache multiline strings, read from files, in variables with computed identifiers, you are going far beyond what shell as a language is suitable for. Unfortunately, I don't see a better choice as an implementation language of gnulib-tool: - Python is good for text processing but does incompatible changes in the language definition every couple of years. - Perl is excluded because of the misdesigned syntax, and it also has incompatible changes e.g. between perl 5.6 and 5.8. - Java is not good because although it is standardized and fast and GNU has a free implementation of it, its text processing is not expressive enough (too verbose). - m4 is maybe powerful but too few people know how to program it. Bruno