Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

via GitHub Fri, 11 Jul 2025 15:48:36 -0700


kaivalnp commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3064104650


   > is the weird "search after indexing" regression specific only to this PR?
   
   It's slightly different here -- I tried the following on `main`
   
   When a fresh index is created (run 1):
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.962        2.387   2.386        1.000  100000   100      50       64      
  250     7 bits     11.68       8563.11           52.88             1          
 77.44       146.866       73.624       HNSW
    0.961        2.392   2.391        1.000  100000   100      50       64      
  250     4 bits     11.62       8604.37           51.28             1          
 77.46       110.245       37.003       HNSW
   ```
   
   Then a run without `-reindex` (run 2, same index as run 1):
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.962        2.550   2.549        0.999  100000   100      50       64      
  250     7 bits      0.00      Infinity            0.12             1          
 77.44       146.866       73.624       HNSW
    0.961        2.522   2.521        0.999  100000   100      50       64      
  250     4 bits      0.00      Infinity            0.13             1          
 77.46       110.245       37.003       HNSW
   ```
   
   Then a run with `-reindex` (run 3, keeping index from run 1 around):
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.961        1.821   1.820        1.000  100000   100      50       64      
  250     7 bits     11.41       8765.01           53.00             1          
 77.45       146.866       73.624       HNSW
    0.962        1.851   1.850        1.000  100000   100      50       64      
  250     4 bits     11.57       8643.04           51.66             1          
 77.49       110.245       37.003       HNSW
   ```
   
   And finally a run without `-reindex` (run 4, same index as run 3):
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.961        2.525   2.524        0.999  100000   100      50       64      
  250     7 bits      0.00      Infinity            0.12             1          
 77.45       146.866       73.624       HNSW
    0.962        2.517   2.516        0.999  100000   100      50       64      
  250     4 bits      0.00      Infinity            0.13             1          
 77.49       110.245       37.003       HNSW
   ```
   
   This behavior of seeing a performance improvement after re-indexing (but not 
in a fresh index) has been consistent for me (see [this previous 
result](https://github.com/apache/lucene/pull/14874#issuecomment-3033793168))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

Reply via email to