shbhar commented on PR #15903:
URL: https://github.com/apache/lucene/pull/15903#issuecomment-4237744339

   @mccullocht I had to re-run some benchmarks but now tested all four 
combinations of centering × rotation on both datasets. To make the comparison 
fair, I disabled OSQ's internal per-segment centering (forcing centroid to 
zero) so both methods are fully data-blind at segment level. All benchmarks: 
aarch64 r7g.8xlarge, single segment (forceMerge), M=32, topK=10, fanout=50, 
1-bit search R@10.
   
   ### ASIN 1M × 4096d
   
   | | Unrotated | Rotated |
   |---|-----------|---------|
   | **Uncentered** | OSQ=0.740, TQ=0.743 | OSQ=0.754, TQ=0.737* |
   | **Centered** | OSQ=0.792, TQ=0.791 | OSQ=0.805, TQ=0.786* |
   
   ### Cohere 5M × 1024d
   
   | | Unrotated | Rotated |
   |---|-----------|---------|
   | **Uncentered** | OSQ=0.596, TQ=0.607 | OSQ=0.622, TQ=0.604* |
   | **Centered** | OSQ=0.630, TQ=0.648 | OSQ=0.659, TQ=0.643* |
   
   *Double rotated - maybe not a noop and seems to hurt TQ (more floating point 
error?)
   
   **OSQ(no centering)**: centroid forced to zero to disable per-segment 
centering but per-vector `optimizeIntervals()` + 14-byte corrections still run. 
   
   **Key observations:**
   - **On centered+unrotated data, TQ ≈ OSQon ASIN (0.791 vs 0.792) and TQ wins 
on Cohere (0.648 vs 0.630).** This is the most practical comparison since 
centering is a common preprocessing step, and neither method applies external 
rotation.
   - **Rotation does help OSQ** OSQ gains +0.013/+0.029 from rotation
   - **On raw data, TQ has a slight edge** (0.743 vs 0.740 ASIN, 0.607 vs 0.596 
Cohere) — TQ's rotation partially compensates for the lack of centering. But 
this is close to cross-run variance of 0.005 so the edge appears to be slight 
on these datasets
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to