kaivalnp opened a new pull request, #15429: URL: https://github.com/apache/lucene/pull/15429
### Description While merging vectors across multiple segments, we can [re-use information from earlier HNSW graphs](https://github.com/apache/lucene/blob/d3bd7c7947387881145d1c20e8c1c5b74f9a55a8/lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java#L68-L103) -- either an entire graph as a starting point, or previous connections as seeds for insertion! These nice optimizations are used only if there are [no deleted documents](https://github.com/apache/lucene/blob/d3bd7c7947387881145d1c20e8c1c5b74f9a55a8/lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java#L195-L206) (#15003 is a nice PR that allows re-using some information even _with_ deletes, which is how I got to looking at this class). However, should the gating flag be "no deleted vectors" instead of "no deleted documents"? This has two benefits: - Optimizations kick in more frequently (when a segment has deleted documents, but none of those had vectors) - Checking _whether_ the optimizations can be applied is sped up too (only need to check for "liveness" of documents with vectors, instead of "liveness" of all documents) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
