Pulkitg64 commented on PR #15003: URL: https://github.com/apache/lucene/pull/15003#issuecomment-3412537982
Hi @benwtrent Sorry, I was on long vacation in September, so couldn't spend time on this. But this is my main priority right now. > Is our concern that we get a disconnected graph? Yes, that's right. If I keep dropping nodes from the graph without reconnecting them, then I am seeing big drop in recall numbers. In one of the knnPerfTest run, I kept deleting 10% of nodes from a segment/graph continuously and after 5 iteration, recall for that segment/graph dropped by 10%. For single segment graph with 1MM docs/vectors | Iteration | Number of Nodes After Deletion | Recall % | |-----------|--------------------------------|----------| | 0 | 1000000 | 81 | | 1 | 900027 | 79.4 | | 2 | 810296 | 77.9 | | 3 | 729025 | 75.8 | | 4 | 656285 | 73.9 | | 5 | 590681 | 71.6 | So, as next step currently I am trying to reconnect those nodes for which outdegree is less than some threshold as pointed out by @msokolov. Will post the numbers as soon as I get some success. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
