Re: [PR] Implement off-heap quantized scoring [lucene]

via GitHub Sun, 29 Jun 2025 03:33:50 -0700


kaivalnp commented on PR #14863:
URL: https://github.com/apache/lucene/pull/14863#issuecomment-3016573861


   I ran some benchmarks on Cohere vectors (768d) for 7-bit and 4-bit 
(compressed) quantization..
   
   `main` without `jdk.incubator.vector`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.860        2.815   2.806        0.997  100000   100      50       64      
  250     7 bits     44.07       2269.17           46.79             1          
373.72       366.592       73.624       HNSW
    0.545        3.193   3.185        0.997  100000   100      50       64      
  250     4 bits     47.26       2115.95           50.04             1          
338.13       329.971       37.003       HNSW
   ```
   
   `main` with `jdk.incubator.vector`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.863        1.904   1.886        0.991  100000   100      50       64      
  250     7 bits     28.65       3490.65           29.66             1          
373.69       366.592       73.624       HNSW
    0.545        1.313   1.305        0.994  100000   100      50       64      
  250     4 bits     22.86       4373.88           17.84             1          
338.13       329.971       37.003       HNSW
   ```
   
   This PR without `jdk.incubator.vector`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.861        2.774   2.765        0.997  100000   100      50       64      
  250     7 bits     44.60       2242.00           46.71             1          
373.73       366.592       73.624       HNSW
    0.545        3.147   3.139        0.997  100000   100      50       64      
  250     4 bits     47.93       2086.51           50.20             1          
338.11       329.971       37.003       HNSW
   ```
   
   This PR with `jdk.incubator.vector`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.861        1.612   1.603        0.994  100000   100      50       64      
  250     7 bits     22.99       4349.53           24.78             1          
373.70       366.592       73.624       HNSW
    0.545        1.277   1.269        0.994  100000   100      50       64      
  250     4 bits     21.60       4630.49           17.41             1          
338.11       329.971       37.003       HNSW
   ```
   
   I did see slight fluctuation across runs, but the search time was \~10% 
faster for 7-bit and very slightly faster for 4-bit (compressed). Indexing and 
force merge times have improved by \~15%


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Implement off-heap quantized scoring [lucene]

Reply via email to