mbrette commented on issue #12440: URL: https://github.com/apache/lucene/issues/12440#issuecomment-1682522530
What is your take on existing merge optimization #12050 ? The approach seems very effective (*), however it does not work if there are deleted documents, which is likely to happen in most usage scenario. (*) To assess how effective it is, I ran a small test. I ran a unit test using HnswGraphBuilder and OnHeapHnswGraph, simulating a big segment of 1M vector (128 dim) merging with a smaller segment of 10k vectors. If I compare initializing the HNSW graph from the big segment, vs. rebuilding the graph, it is 100 times faster - even though initializing the HNSW graph recompute neighbors scores. A given vector/document will go through many segment merge in its life time, so the benefit of this optimization accrue a lot. Caveat: I used random vectors. Otherwise, naive question: have we consider integrating other - native - libraries (faiss, raft, nmslib...) like what is done in open search (at a higher abstraction level though). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org