reassign 1084235 python3-nltk 3.9.1-1
reassign 1084236 python3-nltk 3.9.1-1
reassign 1084237 python3-nltk 3.9.1-1
reassign 1084242 python3-nltk 3.9.1-1
reassign 1084249 python3-nltk 3.9.1-1
reassign 1084250 python3-nltk 3.9.1-1
reassign 1084291 python3-nltk 3.9.1-1
reassign 1084292 python3-nltk 3.9.1-1
reassign 1084294 python3-nltk 3.9.1-1
reassign 1084299 python3-nltk 3.9.1-1
reassign 1084392 python3-nltk 3.9.1-1
reassign 1084300 python3-nltk 3.9.1-1
reassign 1084306 python3-nltk 3.9.1-1
reassign 1084323 python3-nltk 3.9.1-1
reassign 1084332 python3-nltk 3.9.1-1
reassign 1084333 python3-nltk 3.9.1-1
reassign 1084334 python3-nltk 3.9.1-1
reassign 1084337 python3-nltk 3.9.1-1
reassign 1084338 python3-nltk 3.9.1-1
reassign 1084339 python3-nltk 3.9.1-1
reassign 1084341 python3-nltk 3.9.1-1
reassign 1084342 python3-nltk 3.9.1-1
reassign 1084344 python3-nltk 3.9.1-1
reassign 1084345 python3-nltk 3.9.1-1
reassign 1084346 python3-nltk 3.9.1-1
reassign 1084349 python3-nltk 3.9.1-1
reassign 1084385 python3-nltk 3.9.1-1
reassign 1084386 python3-nltk 3.9.1-1
forcemerge 1084235 1084236 1084237 1084242 1084249 1084250 1084291 1084292 
1084294 1084299 1084392 1084300 1084306 1084323 1084332 1084333 1084334 1084337 
1084338 1084339 1084341 1084342 1084344 1084345 1084346 1084349 1084385 1084386
affects 1084235 src:a2d src:abydos src:aiodogstatsd src:blag 
src:djangorestframework-api-key src:djangorestframework src:libspng 
src:mailmanclient src:markdown-callouts src:mintpy src:mkdocs-literate-nav 
src:mkdocs-section-index src:nlopt src:pydoctor src:python-django-pgtrigger 
src:python-djangorestframework-yaml src:python-djantic src:python-igraph 
src:python-inline-snapshot src:python-jellyfish src:python-markdown 
src:python-mkdocs src:python-opt-einsum src:python-pipx src:python-respx 
src:python-uvicorn src:twisted src:typer
thanks

On Mon, Oct 07, 2024 at 11:19:03AM +0100, Colin Watson wrote:
> On Mon, Oct 07, 2024 at 10:38:13AM +0200, Santiago Vila wrote:
> > During a rebuild of all packages in unstable, your package failed to build:
> [...]
> > /usr/lib/python3/dist-packages/nltk/stem/__init__.py:34: in <module>
> >     from nltk.stem.wordnet import WordNetLemmatizer
> > /usr/lib/python3/dist-packages/nltk/stem/wordnet.py:13: in <module>
> >     class WordNetLemmatizer:
> > /usr/lib/python3/dist-packages/nltk/stem/wordnet.py:48: in WordNetLemmatizer
> >     morphy = wn.morphy
> > /usr/lib/python3/dist-packages/nltk/corpus/util.py:120: in __getattr__
> >     self.__load()
> > /usr/lib/python3/dist-packages/nltk/corpus/util.py:86: in __load
> >     raise e
> > /usr/lib/python3/dist-packages/nltk/corpus/util.py:81: in __load
> >     root = nltk.data.find(f"{self.subdir}/{self.__name}")
> > /usr/lib/python3/dist-packages/nltk/data.py:579: in find
> >     raise LookupError(resource_not_found)
> > E   LookupError:
> > E   **********************************************************************
> > E     Resource wordnet not found.
> > E     Please use the NLTK Downloader to obtain the resource:
> > E
> > E     >>> import nltk
> > E     >>> nltk.download('wordnet')
> > E     
> > E     For more information see: https://www.nltk.org/data.html
> > E
> > E     Attempted to load corpora/wordnet
> > E
> > E     Searched in:
> > E       - '/<<PKGBUILDDIR>>/.pybuild/cpython3_3.12_pydoctor/nltk_data'
> > E       - '/usr/nltk_data'
> > E       - '/usr/share/nltk_data'
> > E       - '/usr/lib/nltk_data'
> > E       - '/usr/share/nltk_data'
> > E       - '/usr/local/share/nltk_data'
> > E       - '/usr/lib/nltk_data'
> > E       - '/usr/local/lib/nltk_data'
> > E   **********************************************************************
> 
> I assume this is because some downloadable data went away, though I'm
> not certain.  Still, we obviously shouldn't have an implicit dependency
> on downloaded data during package builds.
> 
> Carsten, what would you think of this patch to python-lunr, which fixes
> both pydoctor and twisted (and I suspect probably a bunch of other
> packages, since mkdocs also depends on python3-lunr)?

Cancel this - we don't need to change python-lunr.  Sorry to bother you,
Carsten.

I tracked this down to a regression in nltk instead.  This is
https://github.com/nltk/nltk/issues/3308, fixed in
https://github.com/nltk/nltk/pull/3309.  

Mo, could we please apply the attached patch to nltk?  I've test-built
all the affected packages against this.  python-igraph has uninstallable
build-dependencies (indirectly due to https://bugs.debian.org/1084781, I
think), while python-uvicorn fails in an unrelated way (it looks as
though it may be fixed by the changes to ProxyHeadersMiddleware in
0.31.0); but everything else from the list of affected packages above
builds cleanly again after applying this patch.

Thanks,

-- 
Colin Watson (he/him)                              [cjwat...@debian.org]
>From 0afcdd6143f1dc3d76965cf03b5c856d7ae4e2b8 Mon Sep 17 00:00:00 2001
From: Colin Watson <cjwat...@debian.org>
Date: Tue, 8 Oct 2024 21:58:49 +0100
Subject: [PATCH] Don't read the WordNet corpus before it is needed

Closes: #1084323
---
 debian/changelog                              |   6 +
 .../import-wordnet-corpus-lazily.patch        | 122 ++++++++++++++++++
 debian/patches/series                         |   1 +
 3 files changed, 129 insertions(+)
 create mode 100644 debian/patches/import-wordnet-corpus-lazily.patch
 create mode 100644 debian/patches/series

diff --git a/debian/changelog b/debian/changelog
index cf26f5f..8d148ed 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+nltk (3.9.1-1.1) UNRELEASED; urgency=medium
+
+  * Don't read the WordNet corpus before it is needed (closes: #1084323).
+
+ -- Colin Watson <cjwat...@debian.org>  Tue, 08 Oct 2024 21:12:41 +0100
+
 nltk (3.9.1-1) unstable; urgency=medium
 
   * New upstream version 3.9.1 (Closes: #1074423)
diff --git a/debian/patches/import-wordnet-corpus-lazily.patch b/debian/patches/import-wordnet-corpus-lazily.patch
new file mode 100644
index 0000000..7c93a11
--- /dev/null
+++ b/debian/patches/import-wordnet-corpus-lazily.patch
@@ -0,0 +1,122 @@
+From: Eric Kafe <kafe.e...@gmail.com>
+Date: Sun, 18 Aug 2024 16:09:01 +0200
+Subject: Fix bug in WordNetLemmatizer
+
+Fix #3308 by not importing WordNet's _morphy and morphy before they are needed.
+
+Origin: upstream, https://github.com/nltk/nltk/pull/3309
+Bug: https://github.com/nltk/nltk/issues/3308
+Bug-Debian: https://bugs.debian.org/1084323
+Last-Update: 2024-10-08
+---
+ nltk/stem/wordnet.py | 71 +++++++++++++++++++++++++++++-----------------------
+ 1 file changed, 39 insertions(+), 32 deletions(-)
+
+diff --git a/nltk/stem/wordnet.py b/nltk/stem/wordnet.py
+index 76caf1b..87d08c7 100644
+--- a/nltk/stem/wordnet.py
++++ b/nltk/stem/wordnet.py
+@@ -7,64 +7,71 @@
+ # URL: <https://www.nltk.org/>
+ # For license information, see LICENSE.TXT
+ 
+-from nltk.corpus import wordnet as wn
+-
+ 
+ class WordNetLemmatizer:
+     """
+     WordNet Lemmatizer
+ 
+-    Provides 3 lemmatizer modes:
+-
+-    1. _morphy() is an alias to WordNet's _morphy lemmatizer.
+-    It returns a list of all lemmas found in WordNet.
+-
+-    >>> wnl = WordNetLemmatizer()
+-    >>> print(wnl._morphy('us', 'n'))
+-    ['us', 'u']
+-
+-    2. morphy() is a restrictive wrapper around _morphy().
+-    It returns the first lemma found in WordNet,
+-    or None if no lemma is found.
++    Provides 3 lemmatizer modes: _morphy(), morphy() and lemmatize().
+ 
+-    >>> print(wnl.morphy('us', 'n'))
+-    us
+-
+-    >>> print(wnl.morphy('catss'))
+-    None
+-
+-    3. lemmatize() is a permissive wrapper around _morphy().
++    lemmatize() is a permissive wrapper around _morphy().
+     It returns the shortest lemma found in WordNet,
+     or the input string unchanged if nothing is found.
+ 
+-    >>> print(wnl.lemmatize('us', 'n'))
++    >>> from nltk.stem import WordNetLemmatizer as wnl
++    >>> print(wnl().lemmatize('us', 'n'))
+     u
+ 
+-    >>> print(wnl.lemmatize('Anythinggoeszxcv'))
++    >>> print(wnl().lemmatize('Anythinggoeszxcv'))
+     Anythinggoeszxcv
+ 
+     """
+ 
+-    morphy = wn.morphy
++    def _morphy(self, form, pos, check_exceptions=True):
++        """
++        _morphy() is WordNet's _morphy lemmatizer.
++        It returns a list of all lemmas found in WordNet.
++
++        >>> from nltk.stem import WordNetLemmatizer as wnl
++        >>> print(wnl()._morphy('us', 'n'))
++        ['us', 'u']
++        """
++        from nltk.corpus import wordnet as wn
++
++        return wn._morphy(form, pos, check_exceptions)
++
++    def morphy(self, form, pos=None, check_exceptions=True):
++        """
++        morphy() is a restrictive wrapper around _morphy().
++        It returns the first lemma found in WordNet,
++        or None if no lemma is found.
++
++        >>> from nltk.stem import WordNetLemmatizer as wnl
++        >>> print(wnl().morphy('us', 'n'))
++        us
++
++        >>> print(wnl().morphy('catss'))
++        None
++        """
++        from nltk.corpus import wordnet as wn
+ 
+-    _morphy = wn._morphy
++        return wn.morphy(form, pos, check_exceptions)
+ 
+     def lemmatize(self, word: str, pos: str = "n") -> str:
+         """Lemmatize `word` by picking the shortest of the possible lemmas,
+         using the wordnet corpus reader's built-in _morphy function.
+         Returns the input word unchanged if it cannot be found in WordNet.
+ 
+-        >>> from nltk.stem import WordNetLemmatizer
+-        >>> wnl = WordNetLemmatizer()
+-        >>> print(wnl.lemmatize('dogs'))
++        >>> from nltk.stem import WordNetLemmatizer as wnl
++        >>> print(wnl().lemmatize('dogs'))
+         dog
+-        >>> print(wnl.lemmatize('churches'))
++        >>> print(wnl().lemmatize('churches'))
+         church
+-        >>> print(wnl.lemmatize('aardwolves'))
++        >>> print(wnl().lemmatize('aardwolves'))
+         aardwolf
+-        >>> print(wnl.lemmatize('abaci'))
++        >>> print(wnl().lemmatize('abaci'))
+         abacus
+-        >>> print(wnl.lemmatize('hardrock'))
++        >>> print(wnl().lemmatize('hardrock'))
+         hardrock
+ 
+         :param word: The input word to lemmatize.
diff --git a/debian/patches/series b/debian/patches/series
new file mode 100644
index 0000000..7d56625
--- /dev/null
+++ b/debian/patches/series
@@ -0,0 +1 @@
+import-wordnet-corpus-lazily.patch
-- 
2.45.2

Reply via email to