NOTE This change does not affect the current gnulib-tool.py, just `python` branch. Still this change is going to be integrated later into the gnulib-tool.py.
I've been testing a new command-line parsing along with parsing cached configuration (configure.ac, gnulib-cache.m4 and gnulib-comp.m4 processing). I've noticed that we spend a lot of time whilst processing the contents of AC_PREREQ and AC_CONFIG_AUX_DIR macros. These regular expressions have the following form (I've removed some junk): ".*AC_PREREQ\\(\\[(.*?)\\]\\)" ".*AC_CONFIG_AUX_DIR\\(\\[(.*?)\\]\\)" In Python, however, it seems to be enough to just use the following form: "AC_PREREQ\\(\\[(.*?)\\]\\)" "AC_CONFIG_AUX_DIR\\(\\[(.*?)\\]\\)" Once I started using the latest form, the time required to process each of these regular expressions decreased for about half a second. The regex works even on the following cases: "hello([AC_PREREQ([2.67])])" " AC_PREREQ([2.67])" "helloAC_PREREQ([2.67])world" I suspect that the original form just was a copy-paste from the original gnulib-tool, where it could have been used due to the usage of sed to parse the contents of the configure.ac file. So the questions are: 1. Is the new behavior correct? 2. Shall I push this small optimization? I'd like to do it, because right now everything else I've rewritten works almost instantly, but I still have some doubts. What do you think? BTW, the version from the pygnulib differs a bit already from the gnulib-tool shell script; I've attached the patch. I've also decided to use raw string literals just to make regex less verbose. -- With best regards, Dmitry Selyutin
From 71a8d4a82caf17350cd3fad4ba6feb7b7fdb3e94 Mon Sep 17 00:00:00 2001 From: Dmitry Selyutin <ghostma...@gmail.com> Date: Tue, 12 Sep 2017 18:47:55 +0300 Subject: [PATCH] config: simplify cache regular expressions --- pygnulib/config.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pygnulib/config.py b/pygnulib/config.py index d08181db6..b174b166a 100644 --- a/pygnulib/config.py +++ b/pygnulib/config.py @@ -430,9 +430,9 @@ class Base: class Cache(Base): """gnulib cached configuration""" _AUTOCONF_ = { - "autoconf" : _re_.compile(".*AC_PREREQ\\(\\[(.*?)\\]\\)", _re_.S | _re_.M), - "auxdir" : _re_.compile("^AC_CONFIG_AUX_DIR\\(\\[(.*?)\\]\\)$", _re_.S | _re_.M), - "libtool" : _re_.compile("A[CM]_PROG_LIBTOOL", _re_.S | _re_.M) + "autoconf" : _re_.compile(r"AC_PREREQ\(\[(.*?)\]\)", _re_.S | _re_.M), + "auxdir" : _re_.compile(r"AC_CONFIG_AUX_DIR\(\[(.*?)\]\)$", _re_.S | _re_.M), + "libtool" : _re_.compile(r"A[CM]_PROG_LIBTOOL", _re_.S | _re_.M) } _GNULIB_CACHE_ = { "local" : (str, "gl_LOCAL_DIR"), @@ -470,7 +470,7 @@ class Cache(Base): _GNULIB_CACHE_STR_ += [_key_] else: _GNULIB_CACHE_LIST_ += [_key_] - _GNULIB_CACHE_PATTERN_ = _re_.compile("^(gl_.*?)\\(\\[(.*?)\\]\\)$", _re_.S | _re_.M) + _GNULIB_CACHE_PATTERN_ = _re_.compile(r"^(gl_.*?)\(\[(.*?)\]\)$", _re_.S | _re_.M) def __init__(self, root, m4_base, autoconf=None, **kwargs): -- 2.13.4