kaivalnp opened a new pull request, #15429:
URL: https://github.com/apache/lucene/pull/15429

   ### Description
   
   While merging vectors across multiple segments, we can [re-use information 
from earlier HNSW 
graphs](https://github.com/apache/lucene/blob/d3bd7c7947387881145d1c20e8c1c5b74f9a55a8/lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java#L68-L103)
 -- either an entire graph as a starting point, or previous connections as 
seeds for insertion!
   
   These nice optimizations are used only if there are [no deleted 
documents](https://github.com/apache/lucene/blob/d3bd7c7947387881145d1c20e8c1c5b74f9a55a8/lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java#L195-L206)
 (#15003 is a nice PR that allows re-using some information even _with_ 
deletes, which is how I got to looking at this class).
   
   However, should the gating flag be "no deleted vectors" instead of "no 
deleted documents"?
   
   This has two benefits:
   - Optimizations kick in more frequently (when a segment has deleted 
documents, but none of those had vectors)
   - Checking _whether_ the optimizations can be applied is sped up too (only 
need to check for "liveness" of documents with vectors, instead of "liveness" 
of all documents)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to