msokolov commented on issue #14127:
URL: https://github.com/apache/lucene/issues/14127#issuecomment-2603066263

   I spent some time trying to understand how this arose, and working on a fix, 
and I believe that the BP reordering exposed a pre-existing behavior in the 
component-merging code which could create these duplicates. Nothing really 
prevents it, but it didn't happen before, I'm not completely sure why, but my 
best theory is that because the way graphs are created we always add docs in 
docid order, but this is not true when reordering.  I looked in to how to 
prevent the duplicates, and one thing we could do is to remove them when 
writing the graph in the codec (in `Lucene99HnswVectorsWriter.writeGraph`).  
This is a good place to do it because we sort the nodes there.  Doing this in 
HnswGraphBuilder also is possible, but I think it would be less efficient 
because the neighbor nodes aren't sorted and any given node's neighbors might 
need to be checked multiple times. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to