aruggero commented on PR #16034: URL: https://github.com/apache/lucene/pull/16034#issuecomment-4421659206
Here are the new benchmark results, thanks to the scratch space reuse : | children | k | correlation | main (ms/op) | sibling (ms/op) | overhead | | -------- | --- | --------- | ------------- | --------------- | ---------- | | 4 | 10 | best | 0.070 ± 0.003 | 0.072 ± 0.003 | +2.9% | | 4 | 10 | standard | 0.053 ± 0.003 | 0.058 ± 0.004 | +9.4% | | 4 | 10 | worst | 0.050 ± 0.005 | 0.057 ± 0.003 | +14.0% | | 4 | 100 | best | 0.400 ± 0.013 | 0.452 ± 0.016 | +13.0% | | 4 | 100 | standard | 0.251 ± 0.012 | 0.309 ± 0.012 | +23.1% | | 4 | 100 | worst | 0.270 ± 0.026 | 0.305 ± 0.009 | +13.0% | | 8 | 10 | best | 0.101 ± 0.005 | 0.109 ± 0.006 | +7.9% | | 8 | 10 | standard | 0.065 ± 0.003 | 0.078 ± 0.003 | +20.0% | | 8 | 10 | worst | 0.064 ± 0.003 | 0.080 ± 0.003 | +25.0% | | 8 | 100 | best | 0.642 ± 0.019 | 0.716 ± 0.017 | +11.5% | | 8 | 100 | standard | 0.330 ± 0.027 | 0.486 ± 0.027 | +47.3% | 8 | 100 | worst | 0.307 ± 0.016 | 0.488 ± 0.028 | +59.0% | | 16 | 10 | best | 0.147 ± 0.004 | 0.151 ± 0.008 | +2.7% | | 16 | 10 | standard | 0.080 ± 0.004 | 0.109 ± 0.007 | +36.3% | | 16 | 10 | worst | 0.075 ± 0.005 | 0.107 ± 0.002 | +42.7% | | 16 | 100 | best | 0.985 ± 0.053 | 1.144 ± 0.040 | +16.1% | | 16 | 100 | standard | 0.568 ± 0.022 | 0.858 ± 0.071 | +51.1% | | 16 | 100 | worst | 0.496 ± 0.021 | 0.880 ± 0.052 | +77.4% | | Scenario | Siblings score | Threshold rises | HNSW early exit | Previous overhead | Current overhead | |----------|----------------|-----------------|-----------------|-------------------|------------------| | best | High (nearly identical) | Fast | Yes | ~5–12% | ~3–16% | | standard | Moderate | Moderate | Partial | ~13–60% | ~9–51% | | worst | Random/low | Barely | No | ~12–74% | ~13–77% | The main change worth calling out: - standard improved meaningfully at the top end (60% → 51%) thanks to scratch space reuse — that's the case most representative of real-world data. - The best lower bound dropped to 3% (nearly free for well-correlated siblings with small k). - The worst upper bound nudged up slightly (74% → 77%), but that's within benchmark noise at children=16, k=100. We still have a significant overhead in general. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
