jmazanec15 commented on issue #12440:
URL: https://github.com/apache/lucene/issues/12440#issuecomment-1703143608

   > It's worth exploring some variation of this in my opinion.
   
   @mbrette this is interesting. Definitely worth looking at. It is also worth 
noting that in the DiskANN paper 
(https://suhasjs.github.io/files/diskann_neurips19.pdf) when constructing their 
final structure from N graphs where each node is inserted into x graphs based 
on proximity to graph's representative vector (or centroid), they found that a 
union of edges as a merge worked out well, empirically. 
   
   "Empirically, it turns out that the overlapping nature of the different 
clusters provides sufficient connectivity for the GreedySearch algorithm to 
succeed even if the query’s nearest neighbors are actually split between 
multiple shards. We would like to remark that there have been earlier works [ 9 
, 22 ] which construct indices for large datasets by merging several smaller, 
overlapping indices. However, their ideas for constructing the overlapping 
clusters are different, and a more detailed comparison of these different 
techniques needs to be done."
   
   With these approaches, I wonder how well they preserve overall structure 
over time - i.e. would there be any quality drift.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to