jmazanec15 commented on issue #11354:
URL: https://github.com/apache/lucene/issues/11354#issuecomment-1224742978

   > Removing nodes and repairing the graph could be a nice direction. But for 
now we can keep things simple and assume there's a segment without deletes. If 
that's looking good and shows a nice improvement in index/ merge benchmarks, 
then we can handle deletes in a follow-up.
   
   Thanks @mayya-sharipova @jtibshirani. I started working on this this week. 
Hopefully will have some experimental results by the end of this week or early 
next week. One potential issue I see is that the scores of the neighbors are 
not available through the KnnVectorReaders graphs at merge time. In the version 
Im working on now, I just recompute the scores 
   for the graph that will be retained in the merge process. However, this will 
add some overhead. Short of serializing the neighbor scores with the graph, do 
you have any other ideas for avoiding score re-computation? 
   
   > Another idea I played with at one point was to preserve all the graphs 
from the existing segments (remapping their ordinals) and linking them together 
with additional links. But a lot of links needed to be created in order to get 
close to the recall for a new "from scratch" graph, and I struggled to get any 
improvement. At the time I wasn't even concerning myself about deletions.
   
   @msokolov That's interesting. I can't think of a way to guarantee that the 
graphs merge fully, what approach did you take? I guess one method for this 
might be for semi-inserting nodes (node retains some of its neighbors but also 
gets new ones) from the smaller graph into the larger graph. I think that a 
node in general should have a proportional number of neighbors from each graph 
based on the size of each graph. So if 2/3rds of all nodes are in the larger 
graph, 2/3rds of the neighbors from a node in the smaller graph could be 
updated to nodes in the larger graph. This approach I think would still require 
the procedure of looking up new neighbors for all of the nodes in the smaller 
graph - however search space should be smaller and M would also be smaller, so 
there might be some potential for improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to