Pulkitg64 commented on PR #15003:
URL: https://github.com/apache/lucene/pull/15003#issuecomment-3412537982

   Hi @benwtrent 
   
   Sorry, I was on long vacation in September, so couldn't spend time on this. 
But this is my main priority right now.
   
   >  Is our concern that we get a disconnected graph?
   
   Yes, that's right. If I keep dropping nodes from the graph without 
reconnecting them, then I am seeing big drop in recall numbers. In one of the 
knnPerfTest run, I kept deleting 10% of nodes from a segment/graph continuously 
and after 5 iteration, recall for that segment/graph dropped by 10%. 
   
   For single segment graph with 1MM docs/vectors
   
   | Iteration | Number of Nodes After Deletion | Recall % |
   |-----------|--------------------------------|----------|
   | 0         | 1000000                        | 81       |
   | 1         | 900027                         | 79.4     |
   | 2         | 810296                         | 77.9     |
   | 3         | 729025                         | 75.8     |
   | 4         | 656285                         | 73.9     |
   | 5         | 590681                         | 71.6     |
   
   
   So, as next step currently I am trying to reconnect those nodes for which 
outdegree is less than some threshold as pointed out by @msokolov. Will post 
the numbers as soon as I get some success.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to