Pulkitg64 commented on PR #15003: URL: https://github.com/apache/lucene/pull/15003#issuecomment-3213582208
Based on @msokolov suggestion, I ran the benchmarks by simulating singleton merging. For this I indexed 1M docs and then force merge the segments then delete documents and then again force merge the segment. I am seeing consistent improvement (about 50x speedup) in force merge time after deletes but also degradation in recall numbers (about 10%). It's probably because of disconnectedness issue (Let me try to find connectedness number of these graphs as well.) | Experiment | Baseline | | Candidate | | Change | | |------------|----------|----------------------|-----------|------------------|---------|------------------| | Delete Pct | Recall | Force Merge Time (s) | Recall | Force Merge Time | Recall | Force Merge Time | | 50% delete | 0.892 | 417.52 | 0.763 | 8.43 | -14% | 50x | | 40% delete | 0.887 | 505.74 | 0.799 | 9.91 | -10% | 50x | | 30% delete | 0.88 | 585 | 0.822 | 10.98 | -7% | 53x | | 20% delete | 0.878 | 677 | 0.802 | 12.4 | -9% | 54x | | 10% delete | 0.874 | 772.42 | 0.856 | 13.5 | -2% | 59x | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org