magibney commented on pull request #380: URL: https://github.com/apache/lucene/pull/380#issuecomment-988928174
Thanks for the nudge, @fmmoret. I think if introducing this change, we should really avoid [needlessly building and throwing away](https://github.com/apache/lucene/pull/380#discussion_r750515187) the stringified dictionary. @spyk is this something you'd be interested in pursuing (i.e., pushing a new commit to your PR branch)? Lmk if not and I'll try (or Alessandro, per his earlier comment?) to move it along. >Ideally, opennlp would have a DictionaryLemmatizer ctor that accepts a Reader directly -- I can't imagine that would be a controversial upstream PR? I don't think concerns over the default character encoding issue should hold things up. We're not making anything worse wrt the default encoding assumption. A simple `TODO` comment should suffice. I think we should circle back (I should be able to find the time for this if nobody else steps forward) to actually address such a `TODO` as a separate issue/PR, following something like the `InputStreamReader` approach I mentioned above (trusting someone will contradict me if they disagree with this proposed approach!). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org