Re: [PR] Implement off-heap quantized scoring [lucene]

via GitHub Tue, 29 Jul 2025 13:03:23 -0700


kaivalnp commented on PR #14863:
URL: https://github.com/apache/lucene/pull/14863#issuecomment-3133877677


   We had some interesting findings in #14874, so I updated this PR to reflect 
those..
   
   Benchmarks for 768d Cohere vectors (dot product similarity, 4-bit one is 
compressed):
   
   `main` with `-reindex`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.862        1.607   1.606        1.000  100000   100      50       64      
  250     7 bits     39.43       2536.33           18.73             1          
373.37       366.592       73.624       HNSW
    0.542        1.211   1.210        0.999  100000   100      50       64      
  250     4 bits     34.09       2933.76           15.33             1          
337.76       329.971       37.003       HNSW
   ```
   
   `main` without `-reindex`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.862        1.448   1.447        0.999  100000   100      50       64      
  250     7 bits      0.00      Infinity            0.12             1          
373.37       366.592       73.624       HNSW
    0.542        1.075   1.074        0.999  100000   100      50       64      
  250     4 bits      0.00      Infinity            0.12             1          
337.76       329.971       37.003       HNSW
   ```
   
   This PR with `-reindex`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.862        1.394   1.392        0.999  100000   100      50       64      
  250     7 bits     36.29       2755.73           17.35             1          
373.36       366.592       73.624       HNSW
    0.541        1.135   1.134        0.999  100000   100      50       64      
  250     4 bits     34.85       2869.11           15.53             1          
337.74       329.971       37.003       HNSW
   ```
   
   This PR without `-reindex`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.862        1.329   1.328        0.999  100000   100      50       64      
  250     7 bits      0.00      Infinity            0.12             1          
373.36       366.592       73.624       HNSW
    0.541        1.141   1.139        0.998  100000   100      50       64      
  250     4 bits      0.00      Infinity            0.13             1          
337.74       329.971       37.003       HNSW
   ```
   
   We see some speedup in both indexing and search for 7-bit compression, not 
so much for the 4-bit one
   
   I wrote specific functions in `PanamaVectorUtilSupport` to compute the "dot 
product" between two compressed (i.e. "packed") 4-bit integers (so no need to 
copy to heap and decompress) to be used during indexing, these are inspired 
from another function which assumed one of the vectors to be uncompressed (i.e. 
"unpacked")
   
   The drawback here is that we'll need specific functions for comparing 
"packed" versions of the query / documents for other similarity functions like 
"euclidean" as well -- looking for inputs on whether the gains justify 
maintaining those functions..
   
   Also tagging people who might be interested in these changes, maybe 
@benwtrent (since you were exploring something similar in #13497) or 
@ChrisHegarty?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Implement off-heap quantized scoring [lucene]

Reply via email to