benwtrent opened a new issue, #12440:
URL: https://github.com/apache/lucene/issues/12440

   ### Description
   
   It is well known that when merging segments, HNSW is not merged in linear 
time. We lose a fair bit of useful information as all the NSGs are effectively 
thrown away (except for a single graph). This means we have useful distributed 
work from each segment that is ignored. 
   
   Do we think its even possible to improve HNSW? 
   
   There is existing research around merging hierarchical graphs, but not 
specifically HNSW: 
    - https://arxiv.org/abs/1908.00814
    - https://arxiv.org/abs/1912.01059 (how they merge the parallel built 
graphs, not the GPU focused things)
   
   One other option that comes to my mind is keeping NSG scores and use them on 
merge. So, taking advantage of a 
[FINGER](https://www.amazon.science/publications/finger-fast-inference-for-graph-based-approximate-nearest-neighbor-search)
 type of idea. I am not 100% sold on using this during search as it increases 
memory usage. But, if we store these in a separate file (not loaded during 
search) but loaded during merge, we could reduce the number of calculations 
required. 
   
   Of course, all these types of changes require due diligence and do NOT 
obviate the need of a new algorithm in Lucene that is more page-fault friendly 
(IVF, SPANN, DiskNN or something).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to