shbhar commented on PR #15903: URL: https://github.com/apache/lucene/pull/15903#issuecomment-4207686291
Following @mccullocht's advice, had Kiro run centered benchmarks (subtract global mean, re-normalize) on both datasets. Also throwing in x86 (r7i.8xlarge) together to make sure there are no arch specific discrepancies ### ASIN 1M × 4096d (Qwen3-8B, centered, M=32, beamWidth=200, topK=10, fanout=50) | Method | R@10 (arm) | Lat arm | R@10 (x86) | Lat x86 | Index MB | |--------|-----------|---------|-----------|---------|----------| | Float32 | 0.937 | 0.80ms | 0.938 | 1.27ms | 15,678 | | OSQ-1bit | 0.792 | 0.53ms | 0.791 | 0.63ms | 16,183 | | OSQ-1bit+5×rsc | 0.981 | 1.58ms | 0.981 | 1.61ms | 16,183 | | **TQ-1bit** | **0.790** | **0.45ms** | **0.793** | **0.58ms** | **541** | | **TQ-1bit+5×rsc** | **0.976** | **1.51ms** | **0.976** | **1.81ms** | **541** | ### Cohere 5M × 1024d (centered, M=32, beamWidth=100, topK=10, fanout=50) | Method | R@10 (arm) | Lat arm | R@10 (x86) | Lat x86 | Index MB | |--------|-----------|---------|-----------|---------|----------| | Float32 | 0.925 | 1.45ms | 0.929 | 1.42ms | 20,001 | | OSQ-1bit | 0.622 | 0.68ms | 0.629 | 0.71ms | 20,726 | | OSQ-1bit+5×rsc | 0.940 | 2.42ms | 0.946 | 2.32ms | 20,726 | | **TQ-1bit** | **0.651** | **0.66ms** | **0.644** | **0.55ms** | **1,048** | | **TQ-1bit+5×rsc** | **0.953** | **2.24ms** | **0.951** | **2.05ms** | **1,048** | ### Centering impact (same dataset cross-run deltas) | Method | Dataset | Uncentered | Centered | Δ | |--------|---------|-----------|----------|---| | TQ-1bit | ASIN 4096d | 0.741 | **0.790** | **+0.049** | | TQ-1bit | Cohere 1024d | 0.608 | **0.651** | **+0.043** | | OSQ-1bit | ASIN 4096d | 0.774 | 0.792 | +0.018 | | OSQ-1bit | Cohere 1024d | 0.631 | 0.622 | -0.009 | Like you suspected, it does appear that centering has a big impact on TQ and with it TQ-1bit essentially ties OSQ-1bit on ASIN (0.790 vs 0.792 on graviton but flipped for intel - probably just run to run indeterminism) and beats it on Cohere on both runs (0.651 vs 0.622 on graviton and 0.644 vs 0.629 on intel). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
