mayya-sharipova opened a new pull request, #14331: URL: https://github.com/apache/lucene/pull/14331
Currently when doing merging of HNSW graphs incrementally, we first initialize a graph from the biggest segment, and for other segments, we rebuild the graphs completely by going through a segment's vector values one by one, searching for it in the new graph to find best neighbours to connect with. This PR proposes more efficient merging based on the idea if we know where we want to insert a node, we have a good idea of where we want to insert its neighbours. Similarly to the current approach, we initialize a new graph from a graph from the biggest segments. For all other segments, we find a smaller set of nodes that "covers" their graph, and we insert that set as usual. For other nodes, outside of J set, we do lighter searches with calculated eps. This allows substantial speedups. The algorithm is based on the following steps: 1. Get all graphs that don't have deletions and sort them by size (descending). 2. Copy the largest graph to the new graph (`gL`). 3. For each remaining small graph (`gS`): - Find the nodes that best cover `gS` (join set `j`). These nodes will be inserted into `gL` as usual: by searching `gL` to find the best candidates (`w`) to which connect the nodes. - For each remaining node in `gS`: - We provide `eps` to search in `gL`. We form `eps` by the union of the node's neighbors in `gS` and the node's neighbors' neighbors in `gL`. We also limit `beamWidth` (`efConstruction` to `M * 2`). Algorithm designed by Thomas Veasey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org