shbhar commented on PR #15903: URL: https://github.com/apache/lucene/pull/15903#issuecomment-4237744339
@mccullocht I had to re-run some benchmarks but now tested all four combinations of centering × rotation on both datasets. To make the comparison fair, I disabled OSQ's internal per-segment centering (forcing centroid to zero) so both methods are fully data-blind at segment level. All benchmarks: aarch64 r7g.8xlarge, single segment (forceMerge), M=32, topK=10, fanout=50, 1-bit search R@10. ### ASIN 1M × 4096d | | Unrotated | Rotated | |---|-----------|---------| | **Uncentered** | OSQ=0.740, TQ=0.743 | OSQ=0.754, TQ=0.737* | | **Centered** | OSQ=0.792, TQ=0.791 | OSQ=0.805, TQ=0.786* | ### Cohere 5M × 1024d | | Unrotated | Rotated | |---|-----------|---------| | **Uncentered** | OSQ=0.596, TQ=0.607 | OSQ=0.622, TQ=0.604* | | **Centered** | OSQ=0.630, TQ=0.648 | OSQ=0.659, TQ=0.643* | *Double rotated - maybe not a noop and seems to hurt TQ (more floating point error?) **OSQ(no centering)**: centroid forced to zero to disable per-segment centering but per-vector `optimizeIntervals()` + 14-byte corrections still run. **Key observations:** - **On centered+unrotated data, TQ ≈ OSQon ASIN (0.791 vs 0.792) and TQ wins on Cohere (0.648 vs 0.630).** This is the most practical comparison since centering is a common preprocessing step, and neither method applies external rotation. - **Rotation does help OSQ** OSQ gains +0.013/+0.029 from rotation - **On raw data, TQ has a slight edge** (0.743 vs 0.740 ASIN, 0.607 vs 0.596 Cohere) — TQ's rotation partially compensates for the lack of centering. But this is close to cross-run variance of 0.005 so the edge appears to be slight on these datasets -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
