magibney commented on a change in pull request #380: URL: https://github.com/apache/lucene/pull/380#discussion_r752448693
########## File path: lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/tools/OpenNLPOpsFactory.java ########## @@ -169,11 +169,14 @@ public static String getLemmatizerDictionary(String dictionaryFile, ResourceLoad builder.append(chars, 0, numRead); } } while (numRead > 0); - dictionary = builder.toString(); - lemmaDictionaries.put(dictionaryFile, dictionary); + String dictionary = builder.toString(); + InputStream dictionaryInputStream = + new ByteArrayInputStream(dictionary.getBytes(StandardCharsets.UTF_8)); + dictionaryLemmatizer = new DictionaryLemmatizer(dictionaryInputStream); Review comment: Yes, sorry -- thanks for the clarification. What I mentioned was tangential, and narrowly focused on/reading between the lines of one minor aspect of the current implementation. I didn't mean to imply the String/re-parsing _per se_ was the main issue. The hashmaps are surely more important, as you say. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org