magibney commented on pull request #380: URL: https://github.com/apache/lucene/pull/380#issuecomment-970573111
I'm definitely not the ideal reviewer for this either, but fwiw: the only potential issue I can see here is the sharing of this component (and ultimately the underlying opennlp `DictionaryLemmatizer`) across multiple threads. IIUC, the initial approach (repeatedly recreating a new `DictionaryLemmatizer` from the raw dictionary string for each call to `FilterFactory.create()`) dodges that issue, because TokenStreams only support single-threaded access. That said, nothing in the underlying implementation and expected use of `DictionaryLemmatizer` looks like it would be an issue in practice for multi-threaded access. The only thing that seems a little weird to me, even for a single-threaded case, is that `DictionaryLemmatizer` [directly exposes](https://github.com/apache/opennlp/blob/c88f57814c0af0dccf471b895a35981ecdac2e7a/opennlp-tools/src/main/java/opennlp/tools/lemmatizer/DictionaryLemmatizer.java#L84-L86) its internal backing HashMap via a public getter, so it could in principle be modified by any caller. But it doesn't look like DictionaryLemmatizer leaks from `NLPLemmatizerOp`, so I guess that's ok? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org