[GitHub] [lucene] magibney commented on a change in pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue

GitBox Thu, 18 Nov 2021 09:04:23 -0800


magibney commented on a change in pull request #380:
URL: https://github.com/apache/lucene/pull/380#discussion_r752448693




##########
File path: 
lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/tools/OpenNLPOpsFactory.java
##########
@@ -169,11 +169,14 @@ public static String getLemmatizerDictionary(String 
dictionaryFile, ResourceLoad
             builder.append(chars, 0, numRead);
           }
         } while (numRead > 0);
-        dictionary = builder.toString();
-        lemmaDictionaries.put(dictionaryFile, dictionary);
+        String dictionary = builder.toString();
+        InputStream dictionaryInputStream =
+            new 
ByteArrayInputStream(dictionary.getBytes(StandardCharsets.UTF_8));
+        dictionaryLemmatizer = new DictionaryLemmatizer(dictionaryInputStream);

Review comment:
       Yes, sorry -- thanks for the clarification. What I mentioned was 
tangential, and narrowly focused on/reading between the lines of one minor 
aspect of the current implementation. I didn't mean to imply the 
String/re-parsing _per se_ was the main issue. The hashmaps are surely more 
important, as you say.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] magibney commented on a change in pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue

Reply via email to