Package: python3-ply Version: 3.11-7 Severity: important Tags: patch X-Debbugs-Cc: stu...@debian.org
Dear Maintainer, ply uses the __doc__ of each method to calculate a signature for its parsers (this is all within yacc.py). Between Python 3.12 and 3.13, the way __doc__ is extracted has changed, with common whitespace being removed from the docstring. https://docs.python.org/3/whatsnew/3.13.html#other-language-changes The result is that all packaged parsers (such as the one in python3-phply) are invalid when run with Python 3.13, and the parser is then regenerated. This takes a substantial amount of time - enough to slow down the test suite of translate-toolkit from a few seconds to a few minutes. The time is all spent repeatedly regenerating the parser rather than just reading it off the disk. Additionally, any packaged parsers will differ between Python 3.12 and 3.13, and so dh_python3 can't correctly collapse them into dist-pacakges. (See #1095792 for a bit more - this was from an initial look at just python3-phply while this really is a bigger issue with the python3-ply package.) The attached patch undertakes whitespace normalisation on the signature: - the signature inside a cached model is normalised - the signature calculated from the source module is normalised With this normalisation: - both Python 3.12 and 3.13 generate the same signatures on pacakge rebuilds, so our own packaged parsers will be OK - the normalised signature from old parsers generated by Python 3.12 actually matches the normalised new signature when read in, meaning that it's not a cache miss I've tested this patch by: a) all new packages - first building the python3-ply package with this patch - then building a python3-phply package with this patch - then testing with the translate-toolkit test suite to check for performance issues b) existing packages - first building the python3-ply package with this patch - keeping the current python3-phply from sid - then testing with the translate-toolkit test suite to check for performance issues Regards Stuart
>From d93be36fddd970aedb1c0da345d255cef1028e1e Mon Sep 17 00:00:00 2001 From: Stuart Prescott <stu...@debian.org> Date: Sat, 15 Feb 2025 15:42:55 +1100 Subject: [PATCH 1/2] Add patch to normalise whitespace in signature Addresses performance issues seen with phply and translate-toolkit. --- debian/patches/series | 1 + .../signature-whitespace-normalisation.patch | 73 +++++++++++++++++++ 2 files changed, 74 insertions(+) create mode 100644 debian/patches/signature-whitespace-normalisation.patch diff --git a/debian/patches/series b/debian/patches/series index c008b1c..9ddb106 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -1,2 +1,3 @@ replace-removed-assert_-with-assertTrue.patch relax-lex-tabversion-check.patch +signature-whitespace-normalisation.patch diff --git a/debian/patches/signature-whitespace-normalisation.patch b/debian/patches/signature-whitespace-normalisation.patch new file mode 100644 index 0000000..b204fe1 --- /dev/null +++ b/debian/patches/signature-whitespace-normalisation.patch @@ -0,0 +1,73 @@ +Description: Normalise the whitespace in the docstring for signature + The docstring is used in the calculation of the signature of a parser, but + the whitespace in the docstring can change between Python interpreter + versions, most notably with Python 3.13 that strips common whitespace from + the front of the docstring. + . + Without normalisation of the docstring, loading the parser is a cache miss + every time, which is observed as a signicant performance overhead. (See the + translate-toolkit test performance and #1095792 for example) + . + With this normalisation patch, every parsetab.py needs to be rebuilt; it is + impossible to make a patch that turns the Python 3.13 __doc__ back into the + Python 3.12 __doc__ for backwards compatibility. +Author: Stuart Prescott <stu...@debian.org> +--- a/ply/yacc.py ++++ b/ply/yacc.py +@@ -1995,7 +1995,7 @@ + self.lr_productions.append(MiniProduction(*p)) + + self.lr_method = parsetab._lr_method +- return parsetab._lr_signature ++ return _normalize(parsetab._lr_signature) + + def read_pickle(self, filename): + try: +@@ -2022,14 +2022,13 @@ + self.lr_productions.append(MiniProduction(*p)) + + in_f.close() +- return signature ++ return _normalize(signature) + + # Bind all production function names to callable objects in pdict + def bind_callables(self, pdict): + for p in self.lr_productions: + p.bind(pdict) + +- + # ----------------------------------------------------------------------------- + # === LR Generator === + # +@@ -2983,7 +2982,7 @@ + parts.append(f[3]) + except (TypeError, ValueError): + pass +- return ''.join(parts) ++ return _normalize(''.join(parts)) + + # ----------------------------------------------------------------------------- + # validate_modules() +@@ -3134,7 +3133,7 @@ + if isinstance(item, (types.FunctionType, types.MethodType)): + line = getattr(item, 'co_firstlineno', item.__code__.co_firstlineno) + module = inspect.getmodule(item) +- p_functions.append((line, module, name, item.__doc__)) ++ p_functions.append((line, module, name, _normalize(item.__doc__))) + + # Sort all of the actions by line number; make sure to stringify + # modules to make them sortable, since `line` may not uniquely sort all +@@ -3500,3 +3499,13 @@ + + parse = parser.parse + return parser ++ ++ ++def _normalize(s): ++ # Normalize the whitespace in the docstring - this can vary between ++ # Python versions, with changes in Python 3.13 ++ # https://docs.python.org/3/whatsnew/3.13.html#other-language-changes ++ if s: ++ s = re.sub(" +", " ", s) ++ s = re.sub("\n ", "\n", s) ++ return s -- 2.39.5
>From afed2a2b953d715a89918b7616a5372a52a5193f Mon Sep 17 00:00:00 2001 From: Stuart Prescott <stu...@debian.org> Date: Sat, 15 Feb 2025 15:43:03 +1100 Subject: [PATCH 2/2] Add WIP changelog --- debian/changelog | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/debian/changelog b/debian/changelog index 71b226e..d606d8b 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,10 @@ +ply (3.11-7.1) UNRELEASED; urgency=medium + + * Add patch to normalise signature across whitespace changes (and therefore + across Python interpreter versions). + + -- Stuart Prescott <stu...@debian.org> Sat, 15 Feb 2025 15:41:25 +1100 + ply (3.11-7) unstable; urgency=medium * Control: remove team from uploaders. -- 2.39.5