[GitHub] [lucene] magibney commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue

GitBox Tue, 16 Nov 2021 10:54:32 -0800


magibney commented on pull request #380:
URL: https://github.com/apache/lucene/pull/380#issuecomment-970573111



   I'm definitely not the ideal reviewer for this either, but fwiw: the only 
potential issue I can see here is the sharing of this component (and ultimately 
the underlying opennlp `DictionaryLemmatizer`) across multiple threads. IIUC, 
the initial approach (repeatedly recreating a new `DictionaryLemmatizer` from 
the raw dictionary string for each call to `FilterFactory.create()`) dodges 
that issue, because TokenStreams only support single-threaded access.
   
   That said, nothing in the underlying implementation and expected use of 
`DictionaryLemmatizer` looks like it would be an issue in practice for 
multi-threaded access. The only thing that seems a little weird to me, even for 
a single-threaded case, is that `DictionaryLemmatizer` [directly 
exposes](https://github.com/apache/opennlp/blob/c88f57814c0af0dccf471b895a35981ecdac2e7a/opennlp-tools/src/main/java/opennlp/tools/lemmatizer/DictionaryLemmatizer.java#L84-L86)
 its internal backing HashMap via a public getter, so it could in principle be 
modified by any caller. But it doesn't look like DictionaryLemmatizer leaks 
from `NLPLemmatizerOp`, so I guess that's ok?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] magibney commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue

Reply via email to