Vikasht34 commented on issue #14208: URL: https://github.com/apache/lucene/issues/14208#issuecomment-2649836148
@benwtrent here are my thoughts on questions asked **Entry point can be updated at any time (we need to think about this)** 1. Two-Pass Merging to Handle Entry Point Changes - Pass 1: Merge all layers without setting the entry point. - Pass 2: Re-evaluate the best entry point after merging **That the merging needs to be able to move vector values up to a higher layer and/or create a new layer** Uses probabilistic layer assignment to determine whether vectors should be promoted after merging each layer. **but the bulk of the cost is still the bottom layer (as it has all vectors and its all eagerly allocated).** **1. Batch Processing for Bottom Layer Instead of Eager Allocation** - Instead of eagerly allocating all vectors, we process them in batches to reduce peak memory usage. - Each batch of vectors is merged and committed incrementally, preventing a large spike in memory consumption. **2. On-the-Fly Streaming Instead of Full Materialization** - Instead of fully storing neighbor lists in RAM, we use lazy loading (getNeighborsLazy()). - This means we only retrieve neighbors when needed, preventing unnecessary memory overhead. **3. Multi-Threaded Processing for the Bottom Layer** - We distribute the bottom-layer merge across multiple CPU cores. - This ensures that instead of a single-threaded bottleneck, we get true parallel merging. **4. Graph Sparsification (Reducing Redundant Connections)** - Instead of blindly keeping all connections, we prune redundant edges. - Uses HNSW's natural property of diverse neighbors to reduce connections intelligently, keeping only the most useful ones. **5. Union-Find for Efficient Component Merging** - Avoids redundant merging of connected components. - Union-Find ensures that each vector is connected once, preventing wasted CPU cycles. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org