[GitHub] [lucene] jimczi commented on pull request #12169: Introduced the Word2VecSynonymFilter

via GitHub Tue, 25 Apr 2023 02:17:47 -0700


jimczi commented on PR #12169:
URL: https://github.com/apache/lucene/pull/12169#issuecomment-1521454510


   > Adding a clone() / copy() method to HnswGraph seems like a fairly 
unobtrusive way to support this. Javadoc definitely needs to be clearer.
   
   I looked at this and I am a bit reluctant to add this method since the 
`OnHeapHnswGraph` is mutable. That would be confusing and could lead to the 
same issue we have in this PR. I'd say that in this current form the 
`HnswGraph` is an internal class that shouldn't be used outside of the indexing 
primitives. 
   
   I am also curious to hear what's the expectation for this filter. I checked 
some resources created with word2vec or GloVe and   they output a large 
vocabulary size. With a single thread, building a graph of 1M vectors with 
hundreds of dimensions would take several minutes (16 minutes at 1000 docs/s). 
That seems overkill for an application to rebuild this graph on every restart. 
Could we instead build the resource offline (the hnsw graph) and just load it 
when the filter starts? 
   
   > is it worth reverting or we can simply fix it/commit a workaround as soon 
as we find it in the next few days?
   
   I don't have a strong feeling here but I am not sure we should add this 
filter in the common analysis module. As it stands it feels more experimental 
so the sandbox might be more appropriate?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jimczi commented on pull request #12169: Introduced the Word2VecSynonymFilter

Reply via email to