alessandrobenedetti commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1179445835
########## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ########## @@ -64,7 +65,15 @@ public Word2VecSynonymProvider(Word2VecModel model) throws IOException { this.hnswGraph = builder.build(word2VecModel.copy()); } - public List<TermAndBoost> getSynonyms( + /** + * Returns the list of synonyms of a provided term. This method is synchronized because it uses + * the {@link org.apache.lucene.util.hnsw.OnHeapHnswGraph} that is not thread-safe. + * + * @param term term to search to find synonyms + * @param maxSynonymsPerTerm limit of synonyms returned + * @param minAcceptedSimilarity lower similarity threshold to consider another term as synonym + */ + public synchronized List<TermAndBoost> getSynonyms( Review Comment: Hi @jimczi, as someone used to say "Perfect is the enemy of good" :) I agree with you, there's definitely space for improvement here, but I believe it is a nice feature that someone may consider useful. If it is too taxing in terms of resources, we can leave users to decide, mark it as experimental, and let people try and gather more insights. Not all users may be interested in million-sized dictionaries. In this way, we can gather users, insights, and potential new contributions on the matter. We have various benchmarks we collected back in the days, @dantuzi can provide them and we'll post them, as far as I remember they looked decent. We spent a considerable amount of time(=money) in my company to just donate this feature to the community, and keep maintaining it on our own with all the main branch divergence is not sustainable anymore. With this mitigation I believe is a good enough first step that may get attention, new contributors and potentially new fundings to address all your concern. On the other hand, if you believe this can harm the project in any way, I am happy to revert, but I can't guarantee we'll have any time soon to change it and contribute it back :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org