jmazanec15 commented on issue #12440: URL: https://github.com/apache/lucene/issues/12440#issuecomment-1703143608
> It's worth exploring some variation of this in my opinion. @mbrette this is interesting. Definitely worth looking at. It is also worth noting that in the DiskANN paper (https://suhasjs.github.io/files/diskann_neurips19.pdf) when constructing their final structure from N graphs where each node is inserted into x graphs based on proximity to graph's representative vector (or centroid), they found that a union of edges as a merge worked out well, empirically. "Empirically, it turns out that the overlapping nature of the different clusters provides sufficient connectivity for the GreedySearch algorithm to succeed even if the query’s nearest neighbors are actually split between multiple shards. We would like to remark that there have been earlier works [ 9 , 22 ] which construct indices for large datasets by merging several smaller, overlapping indices. However, their ideas for constructing the overlapping clusters are different, and a more detailed comparison of these different techniques needs to be done." With these approaches, I wonder how well they preserve overall structure over time - i.e. would there be any quality drift. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org