[GitHub] [lucene] mbrette commented on issue #12440: Make HNSW merges faster

via GitHub Thu, 17 Aug 2023 08:53:08 -0700


mbrette commented on issue #12440:
URL: https://github.com/apache/lucene/issues/12440#issuecomment-1682522530


   What is your take on existing merge optimization #12050 ?
   The approach seems very effective (*), however it does not work if there are 
deleted documents, which is likely to happen in most usage scenario.
   
   (*) To assess how effective it is, I ran a small test.
   I ran a unit test using HnswGraphBuilder and OnHeapHnswGraph, simulating a 
big segment of 1M vector (128 dim) merging with a smaller segment of 10k 
vectors. 
   If I compare initializing the HNSW graph from the big segment, vs. 
rebuilding the graph, it is 100 times faster - even though initializing the 
HNSW graph recompute neighbors scores.
   A given vector/document will go through many segment merge in its life time, 
so the benefit of this optimization accrue a lot.
   Caveat: I used random vectors.
   
   Otherwise, naive question: have we consider integrating other - native - 
libraries (faiss, raft, nmslib...) like what is done in open search (at a 
higher abstraction level though).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mbrette commented on issue #12440: Make HNSW merges faster

Reply via email to