Package: python3-ply
Version: 3.11-7
Severity: important
Tags: patch
X-Debbugs-Cc: stu...@debian.org

Dear Maintainer,

ply uses the __doc__ of each method to calculate a signature for its
parsers (this is all within yacc.py). Between Python 3.12 and 3.13, the
way __doc__ is extracted has changed, with common whitespace being
removed from the docstring.

https://docs.python.org/3/whatsnew/3.13.html#other-language-changes

The result is that all packaged parsers (such as the one in
python3-phply) are invalid when run with Python 3.13, and the parser is
then regenerated. This takes a substantial amount of time - enough to
slow down the test suite of translate-toolkit from a few seconds to a
few minutes. The time is all spent repeatedly regenerating the parser
rather than just reading it off the disk.

Additionally, any packaged parsers will differ between Python 3.12 and
3.13, and so dh_python3 can't correctly collapse them into
dist-pacakges.

(See #1095792 for a bit more - this was from an initial look at just
python3-phply while this really is a bigger issue with the python3-ply
package.)

The attached patch undertakes whitespace normalisation on the signature:
- the signature inside a cached model is normalised
- the signature calculated from the source module is normalised

With this normalisation:
- both Python 3.12 and 3.13 generate the same signatures on pacakge
  rebuilds, so our own packaged parsers will be OK
- the normalised signature from old parsers generated by Python 3.12
  actually matches the normalised new signature when read in, meaning
  that it's not a cache miss

I've tested this patch by:
a) all new packages
- first building the python3-ply package with this patch
- then building a python3-phply package with this patch
- then testing with the translate-toolkit test suite to check for
  performance issues
b) existing packages
- first building the python3-ply package with this patch
- keeping the current python3-phply from sid
- then testing with the translate-toolkit test suite to check for
  performance issues

Regards
Stuart
>From d93be36fddd970aedb1c0da345d255cef1028e1e Mon Sep 17 00:00:00 2001
From: Stuart Prescott <stu...@debian.org>
Date: Sat, 15 Feb 2025 15:42:55 +1100
Subject: [PATCH 1/2] Add patch to normalise whitespace in signature

Addresses performance issues seen with phply and translate-toolkit.
---
 debian/patches/series                         |  1 +
 .../signature-whitespace-normalisation.patch  | 73 +++++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 debian/patches/signature-whitespace-normalisation.patch

diff --git a/debian/patches/series b/debian/patches/series
index c008b1c..9ddb106 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,2 +1,3 @@
 replace-removed-assert_-with-assertTrue.patch
 relax-lex-tabversion-check.patch
+signature-whitespace-normalisation.patch
diff --git a/debian/patches/signature-whitespace-normalisation.patch 
b/debian/patches/signature-whitespace-normalisation.patch
new file mode 100644
index 0000000..b204fe1
--- /dev/null
+++ b/debian/patches/signature-whitespace-normalisation.patch
@@ -0,0 +1,73 @@
+Description: Normalise the whitespace in the docstring for signature
+ The docstring is used in the calculation of the signature of a parser, but
+ the whitespace in the docstring can change between Python interpreter
+ versions, most notably with Python 3.13 that strips common whitespace from
+ the front of the docstring.
+ .
+ Without normalisation of the docstring, loading the parser is a cache miss
+ every time, which is observed as a signicant performance overhead. (See the
+ translate-toolkit test performance and #1095792 for example)
+ .
+ With this normalisation patch, every parsetab.py needs to be rebuilt; it is
+ impossible to make a patch that turns the Python 3.13 __doc__ back into the
+ Python 3.12 __doc__ for backwards compatibility.
+Author: Stuart Prescott <stu...@debian.org>
+--- a/ply/yacc.py
++++ b/ply/yacc.py
+@@ -1995,7 +1995,7 @@
+             self.lr_productions.append(MiniProduction(*p))
+ 
+         self.lr_method = parsetab._lr_method
+-        return parsetab._lr_signature
++        return _normalize(parsetab._lr_signature)
+ 
+     def read_pickle(self, filename):
+         try:
+@@ -2022,14 +2022,13 @@
+             self.lr_productions.append(MiniProduction(*p))
+ 
+         in_f.close()
+-        return signature
++        return _normalize(signature)
+ 
+     # Bind all production function names to callable objects in pdict
+     def bind_callables(self, pdict):
+         for p in self.lr_productions:
+             p.bind(pdict)
+ 
+-
+ # 
-----------------------------------------------------------------------------
+ #                           === LR Generator ===
+ #
+@@ -2983,7 +2982,7 @@
+                     parts.append(f[3])
+         except (TypeError, ValueError):
+             pass
+-        return ''.join(parts)
++        return _normalize(''.join(parts))
+ 
+     # 
-----------------------------------------------------------------------------
+     # validate_modules()
+@@ -3134,7 +3133,7 @@
+             if isinstance(item, (types.FunctionType, types.MethodType)):
+                 line = getattr(item, 'co_firstlineno', 
item.__code__.co_firstlineno)
+                 module = inspect.getmodule(item)
+-                p_functions.append((line, module, name, item.__doc__))
++                p_functions.append((line, module, name, 
_normalize(item.__doc__)))
+ 
+         # Sort all of the actions by line number; make sure to stringify
+         # modules to make them sortable, since `line` may not uniquely sort 
all
+@@ -3500,3 +3499,13 @@
+ 
+     parse = parser.parse
+     return parser
++
++
++def _normalize(s):
++    # Normalize the whitespace in the docstring - this can vary between
++    # Python versions, with changes in Python 3.13
++    # https://docs.python.org/3/whatsnew/3.13.html#other-language-changes
++    if s:
++        s = re.sub(" +", " ", s)
++        s = re.sub("\n ", "\n", s)
++    return s
-- 
2.39.5

>From afed2a2b953d715a89918b7616a5372a52a5193f Mon Sep 17 00:00:00 2001
From: Stuart Prescott <stu...@debian.org>
Date: Sat, 15 Feb 2025 15:43:03 +1100
Subject: [PATCH 2/2] Add WIP changelog

---
 debian/changelog | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/debian/changelog b/debian/changelog
index 71b226e..d606d8b 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,10 @@
+ply (3.11-7.1) UNRELEASED; urgency=medium
+
+  * Add patch to normalise signature across whitespace changes (and therefore
+    across Python interpreter versions).
+
+ -- Stuart Prescott <stu...@debian.org>  Sat, 15 Feb 2025 15:41:25 +1100
+
 ply (3.11-7) unstable; urgency=medium
 
   * Control: remove team from uploaders.
-- 
2.39.5

Reply via email to