alessandrobenedetti commented on code in PR #12246:
URL: https://github.com/apache/lucene/pull/12246#discussion_r1179445835


##########
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java:
##########
@@ -64,7 +65,15 @@ public Word2VecSynonymProvider(Word2VecModel model) throws 
IOException {
     this.hnswGraph = builder.build(word2VecModel.copy());
   }
 
-  public List<TermAndBoost> getSynonyms(
+  /**
+   * Returns the list of synonyms of a provided term. This method is 
synchronized because it uses
+   * the {@link org.apache.lucene.util.hnsw.OnHeapHnswGraph} that is not 
thread-safe.
+   *
+   * @param term term to search to find synonyms
+   * @param maxSynonymsPerTerm limit of synonyms returned
+   * @param minAcceptedSimilarity lower similarity threshold to consider 
another term as synonym
+   */
+  public synchronized List<TermAndBoost> getSynonyms(

Review Comment:
   Hi @jimczi, as someone used to say "Perfect is the enemy of good" :)
   I agree with you, there's definitely space for improvement here, but I 
believe it is a nice feature that someone may consider useful.
   
   If it is too taxing in terms of resources, we can leave users to decide, 
mark it as experimental, and let people try and gather more insights.
   Not all users may be interested in million-sized dictionaries.
   In this way, we can gather users, insights, and potential new contributions 
on the matter.
   We have various benchmarks we collected back in the days, @dantuzi can 
provide them and we'll post them, as far as I remember they looked decent.
   
   We spent a considerable amount of time(=money) in my company to just donate 
this feature to the community, and keep maintaining it on our own with all the 
main branch divergence is not sustainable anymore.
   
   With this mitigation I believe is a good enough first step that may get 
attention, new contributors and potentially new fundings to address all your 
concern.
   
   On the other hand, if you believe this can harm the project in any way, I am 
happy to revert, but I can't guarantee we'll have any time soon to change it 
and contribute it back :)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to