alessandrobenedetti commented on code in PR #12246:
URL: https://github.com/apache/lucene/pull/12246#discussion_r1180201182


##########
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java:
##########
@@ -64,7 +65,15 @@ public Word2VecSynonymProvider(Word2VecModel model) throws 
IOException {
     this.hnswGraph = builder.build(word2VecModel.copy());
   }
 
-  public List<TermAndBoost> getSynonyms(
+  /**
+   * Returns the list of synonyms of a provided term. This method is 
synchronized because it uses
+   * the {@link org.apache.lucene.util.hnsw.OnHeapHnswGraph} that is not 
thread-safe.
+   *
+   * @param term term to search to find synonyms
+   * @param maxSynonymsPerTerm limit of synonyms returned
+   * @param minAcceptedSimilarity lower similarity threshold to consider 
another term as synonym
+   */
+  public synchronized List<TermAndBoost> getSynonyms(

Review Comment:
   So, I spent some time this morning trying to think about a more elegant way 
to communicate that currently, the OnHeap graph is not thread-safe.
   The latest commit is just tentative, to discuss with you, if it's more 
complicated than expected I can just revert it and leave the mitigation in the 
Word2Vec stuff.
   
   My main intent is to automatically get a benefit for the Word2Vec token 
filter if anyone in the future contributes a thread-safe version of the graph.
   If we put the synchronized in the word2vec and someone fixes the 
thread-safety problem, we may never realize it, and word2bec remain 
unnecessarily slower than it should be.
   
   Maybe adding this synchronized block on the graph, may do the trick?
   So if someone implements thread safety, then he/she changes that block?
   I know synchronized may slow down a bit the things, so to be honest I am 
looking for ideas :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to