[GitHub] [lucene] magibney commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue

GitBox Wed, 08 Dec 2021 07:43:31 -0800


magibney commented on pull request #380:
URL: https://github.com/apache/lucene/pull/380#issuecomment-988928174



   Thanks for the nudge, @fmmoret.
   
   I think if introducing this change, we should really avoid [needlessly 
building and throwing 
away](https://github.com/apache/lucene/pull/380#discussion_r750515187) the 
stringified dictionary. @spyk is this something you'd be interested in pursuing 
(i.e., pushing a new commit to your PR branch)? Lmk if not and I'll try (or 
Alessandro, per his earlier comment?) to move it along.
   
   >Ideally, opennlp would have a DictionaryLemmatizer ctor that accepts a 
Reader directly -- I can't imagine that would be a controversial upstream PR?
   
   I don't think concerns over the default character encoding issue should hold 
things up. We're not making anything worse wrt the default encoding assumption. 
A simple `TODO` comment should suffice. I think we should circle back (I should 
be able to find the time for this if nobody else steps forward) to actually 
address such a `TODO` as a separate issue/PR, following something like the 
`InputStreamReader` approach I mentioned above (trusting someone will 
contradict me if they disagree with this proposed approach!).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] magibney commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue

Reply via email to