Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

via GitHub Thu, 03 Jul 2025 14:48:48 -0700


kaivalnp commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3033770135


   > if the results hold up for 768d vectors
   
   Thanks @msokolov I quantized the Cohere 768d vectors by:
   1. normalizing
   2. scaling each dimension by 256
   3. clipping between \[-128, 127\]
   
   
   i.e. basically replicating [this 
snippet](https://github.com/mikemccand/luceneutil/blob/522d3c377815686ea8a801d6735f68ae69d637d5/src/main/perf/VectorDictionary.java#L197-L198)
   
   But something _**really**_ strange is happening!
   
   I first ran a set of benchmarks using `-reindex` to create fresh indices and 
ensure that the indexing times are not adversely affected:
   
   `main` (run 1)
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.940        1.568   1.567        1.000  200000   100      50       64      
  250         no     63.52       3148.47           23.21             1          
152.11       585.938      585.938       HNSW
   ```
   
   This PR (run 2)
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.939        2.894   2.894        1.000  200000   100      50       64      
  250         no     37.69       5306.87           12.99             1          
152.09       585.938      585.938       HNSW
   ```
   
   This felt a bit strange because of a higher latency at query time 
(conflicting with JMH results) + lower indexing time in this PR -- so I ran a 
follow-up _without `-reindex`_ to search from a built index (basically re-using 
the index created in run 2 for runs 3 and 4):
   
   `main` (run 3)
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.939        1.602   1.601        0.999  200000   100      50       64      
  250         no      0.00      Infinity            0.12             1          
152.09       585.938      585.938       HNSW
   ```
   
   This PR (run 4)
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.939        1.090   1.090        1.000  200000   100      50       64      
  250         no      0.00      Infinity            0.12             1          
152.09       585.938      585.938       HNSW
   ```
   
   The performance in `main` is unchanged -- but we see significant improvement 
after moving the query off-heap!
   I don't have a good explanation for this, I'll try to replicate on 300d 
vectors to see if this holds true!
   
   Meanwhile, it'd be great if you could reproduce the benchmarks like I did :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

Reply via email to