shubhamvishu commented on PR #16092: URL: https://github.com/apache/lucene/pull/16092#issuecomment-4503770225
I also ran the same luceneutil with 4K dimensional vectors and I see even higher impact to recall(**~6-7%** improvement) overall net-net with slight slowness in indexing-rate(~5%) due to rotation overhead. ### With internal Amazon 4K vectors embeddings **Setup**: - 500K docs, 4096 dimensions, DOT_PRODUCT similarity, HNSW (maxConn=64, beamWidth=250), 10K queries. | Bits | Avg Baseline Recall | Avg Candidate Recall(Rotation) | Avg Delta | % Diff | |------|-------------|-------------|-----------|--------| | 1 | 0.828 | 0.889 | +0.061 | **+7.4%** | | 2 | 0.858 | 0.916 | +0.058 | **+6.7%** | | 4 | 0.893 | 0.958 | +0.066 | **+7.3%** | | 7 | 0.920 | 0.972 | +0.052 | **+5.6%** | | 8 | 0.927 | 0.974 | +0.047 | **+5.1%** | | **All bits** | **0.885** | **0.942** | **+0.057** | **+6.4%** | ------------------------------------- | Metric | Baseline | Rotation | % Diff | Impact | |--------|----------|----------|--------|--------| | **Recall** (avg all) | 0.885 | 0.942 | **+6.4%** | Improvement | | **Search latency** | ~2.0 ms | ~2.0 ms | **~0%** | No change | | **Index rate** | ~3830 docs/s | ~3620 docs/s | **-5.5%** | Slightly slower | | **Index size** | 8091-9799 MB | 8091-9799 MB | **0%** | Identical | | **Force merge time** | ~213 s | ~194 s | **-8.9%** | No regression | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
