Re: [PATCH 0/4] faster gnulib-tool

Bruno Haible Thu, 01 Jan 2009 17:17:44 -0800

Hello Ralf,

Thank you for your speedups to gnulib-tool. At first I was, of course,
excited about the 2x speedup. But when looking at the maintainability
of the code that you propose, I'm not fine with all of it any more.


My four objections are:

1) You observe that forking programs in a shell script is slow, and
   therefore propose to use more shell built-ins. The problem with it
   is that I chose to implement gnulib-tool in shell (for the control
   structure) and sed (for the text processing). The shell also has
   some inferior commands for string processing, but sed is vastly
   superior for this purpose. I want to stick with 'sed' for the
   text processing, otherwise we write some parts of the code to use
   shell built-ins, and when we notice that a little more text processing
   is required, we have to rewrite the code to use 'sed'. So the use
   of shell built-ins for text processing turns out to be a
   "premature optimization" (in the sense of Knuth) and hampers
   maintainability.

   If you want to achieve good speedups for scripts that use 'sed':
   can you work towards making 'sed' a bash built-in? This is challenging,
   but if you are after performance, that would be promising.

2) Your patches change the generation of code so that it goes through
   intermediate shell variables. The problem with this is that the
   transformation from string to standard output is not simple:
     echo $string
   outputs the string plus a newline, and 'echo -n' is not portable.

3) The sed expression sed_cache_module in part 1 of your patch is not
   maintainable. The sed_extract_prog was already complex, but what you
   made of it is beyond what is acceptable in code that should be
   maintained 5 and 10 years from now.

4) There is too much 'eval' in the code. As you have seen in an earlier
   patch today, every use of 'eval' can bring a security problem. The
   only uses of 'eval' that are always safe are variable assignments
      eval "$var=\$value"
   when you can guarantee that 'var' is a simple identifier.

And, last not least, more comments would have been better.

So, globally, when you try to cache multiline strings, read from files,
in variables with computed identifiers, you are going far beyond what
shell as a language is suitable for.

Unfortunately, I don't see a better choice as an implementation language
of gnulib-tool:
  - Python is good for text processing but does incompatible changes
    in the language definition every couple of years.
  - Perl is excluded because of the misdesigned syntax, and it also
    has incompatible changes e.g. between perl 5.6 and 5.8.
  - Java is not good because although it is standardized and fast and
    GNU has a free implementation of it, its text processing is not
    expressive enough (too verbose).
  - m4 is maybe powerful but too few people know how to program it.

Bruno

Re: [PATCH 0/4] faster gnulib-tool

Reply via email to