mayya-sharipova opened a new pull request, #14331:
URL: https://github.com/apache/lucene/pull/14331

   Currently when doing merging of HNSW graphs incrementally, 
   we first initialize a graph from the biggest segment, and for other segments,
   we rebuild the graphs completely by going through a segment's vector values
   one by one, searching for it in the new graph to find best neighbours to 
connect with.
   
   This PR proposes more efficient merging based on the idea if we know where 
   we want to insert a node, we have a good idea of where we want to insert its 
neighbours.
   Similarly to the current approach, we initialize a new graph from a graph 
from the
   biggest segments. For all other segments, we find a smaller set of nodes
   that "covers" their graph, and we insert that set as usual. For other nodes,
   outside of J set, we do lighter searches with calculated eps.
   
   This allows substantial speedups.
   
   
   The algorithm is based on the following steps:
   
   1. Get all graphs that don't have deletions and sort them by size 
(descending).
   2. Copy the largest graph to the new graph (`gL`).
   3. For each remaining small graph (`gS`):
      - Find the nodes that best cover `gS` (join set `j`). These nodes will be 
inserted into `gL`  
        as usual: by searching `gL` to find the best candidates (`w`) to which 
connect the nodes.
      - For each remaining node in `gS`:
        - We provide `eps` to search in `gL`. We form `eps` by the union of the 
node's  
          neighbors in `gS` and the node's neighbors' neighbors in `gL`. We 
also limit  
          `beamWidth` (`efConstruction` to `M * 2`).
   
   
   
   
   Algorithm designed by Thomas Veasey


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to