krickert commented on PR #15676: URL: https://github.com/apache/lucene/pull/15676#issuecomment-4010691268
Thanks for the suggestions, @vigyasharma. You’re right; I used the current 100-visit warm-up as a static safeguard to prevent the "entry point trap" at local bridge nodes. My next round of tests will retain the 100-visit warm-up to establish a baseline, then I'll introduce the variance. This approach allows me to isolate the recall-safety of the core logic before adding complexity with additional variables. The current test results show that collaborative search produces results identical to a standard distributed search - achieving recall parity with the current Lucene HNSW implementation - while ensuring it doesn't regress on high-performance local hardware. I've seen significant success testing this on resource-constrained clusters (Raspberry Pis), where the pruning yielded a ~50% reduction in CPU cycles and latency without any recall loss. On high-end localhost setups with small shards, the gains are understandably masked by the raw speed of the traversal, but the recall floor remains solid. Regarding the heuristics: 1. I agree that a static visit count is a blunt instrument. I’m currently preparing a 250GB index benchmark (court law cases) which will provide much more realistic graph depth than my initial tests. 2. Once that larger-scale data is ready, I plan to use it to test your suggestion of a variance-based trigger. This would allow the pruning to be topology-aware rather than relying on a static visit counter. I’ll share those Pareto frontier results once the large-scale runs are complete. It should make the benefits clear even on high-performance hardware. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
