benwtrent opened a new issue, #12440: URL: https://github.com/apache/lucene/issues/12440
### Description It is well known that when merging segments, HNSW is not merged in linear time. We lose a fair bit of useful information as all the NSGs are effectively thrown away (except for a single graph). This means we have useful distributed work from each segment that is ignored. Do we think its even possible to improve HNSW? There is existing research around merging hierarchical graphs, but not specifically HNSW: - https://arxiv.org/abs/1908.00814 - https://arxiv.org/abs/1912.01059 (how they merge the parallel built graphs, not the GPU focused things) One other option that comes to my mind is keeping NSG scores and use them on merge. So, taking advantage of a [FINGER](https://www.amazon.science/publications/finger-fast-inference-for-graph-based-approximate-nearest-neighbor-search) type of idea. I am not 100% sold on using this during search as it increases memory usage. But, if we store these in a separate file (not loaded during search) but loaded during merge, we could reduce the number of calculations required. Of course, all these types of changes require due diligence and do NOT obviate the need of a new algorithm in Lucene that is more page-fault friendly (IVF, SPANN, DiskNN or something). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org