alessandrobenedetti commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1180201182
########## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ########## @@ -64,7 +65,15 @@ public Word2VecSynonymProvider(Word2VecModel model) throws IOException { this.hnswGraph = builder.build(word2VecModel.copy()); } - public List<TermAndBoost> getSynonyms( + /** + * Returns the list of synonyms of a provided term. This method is synchronized because it uses + * the {@link org.apache.lucene.util.hnsw.OnHeapHnswGraph} that is not thread-safe. + * + * @param term term to search to find synonyms + * @param maxSynonymsPerTerm limit of synonyms returned + * @param minAcceptedSimilarity lower similarity threshold to consider another term as synonym + */ + public synchronized List<TermAndBoost> getSynonyms( Review Comment: So, I spent some time this morning trying to think about a more elegant way to communicate that currently, the OnHeap graph is not thread-safe. The latest commit is just tentative, to discuss with you, if it's more complicated than expected I can just revert it and leave the mitigation in the Word2Vec stuff. My main intent is to automatically get a benefit for the Word2Vec token filter if anyone in the future contributes a thread-safe version of the graph. If we put the synchronized in the word2vec and someone fixes the thread-safety problem, we may never realize it, and word2bec remain unnecessarily slower than it should be. Maybe adding this synchronized block on the graph, may do the trick? So if someone implements thread safety, then he/she changes that block? I know synchronized may slow down a bit the things, so to be honest I am looking for ideas :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org