Re: [I] Data-blind scalar quantization [lucene]

via GitHub Mon, 18 May 2026 11:14:47 -0700


shubhamvishu commented on issue #16029:
URL: https://github.com/apache/lucene/issues/16029#issuecomment-4480565151


   I also ran the same luceneutil with 4K dimensional vectors and I see even 
higher impact to recall(**~6-7%** improvement) overall net-net with slight 
slowness in indexing-rate due to rotation overhead. Here is a summary of that :
   &nbsp;
   **Setup**: 
   - 500K docs, 4096 dimensions, DOT_PRODUCT similarity, HNSW (maxConn=64, 
beamWidth=250), 10K queries.
   
   ### Results
   
   | Bits | Avg Baseline Recall | Avg Candidate Recall(Rotation) | Avg Delta | 
% Diff |
   |------|-------------|-------------|-----------|--------|
   | 1    | 0.828       | 0.889       | +0.061    | **+7.4%** |
   | 2    | 0.858       | 0.916       | +0.058    | **+6.7%** |
   | 4    | 0.893       | 0.958       | +0.066    | **+7.3%** |
   | 7    | 0.920       | 0.972       | +0.052    | **+5.6%** |
   | 8    | 0.927       | 0.974       | +0.047    | **+5.1%** |
   | **All bits** | **0.885** | **0.942** | **+0.057** | **+6.4%** |
   
   -------------------------------------
   
   | Metric | Baseline | Rotation | % Diff | Impact |
   |--------|----------|----------|--------|--------|
   | **Recall** (avg all) | 0.885 | 0.942 | **+6.4%** | Improvement |
   | **Search latency** | ~2.0 ms | ~2.0 ms | **~0%** | No change |
   | **Index rate** | ~3830 docs/s | ~3620 docs/s | **-5.5%** | Slightly slower 
|
   | **Index size** | 8091-9799 MB | 8091-9799 MB | **0%** | Identical |
   | **Force merge time** | ~213 s | ~194 s | **-8.9%** | No regression |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Data-blind scalar quantization [lucene]

Reply via email to