Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

via GitHub Fri, 11 Jul 2025 12:23:43 -0700


kaivalnp commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3063496698


   Thanks for all the deep-dive here @msokolov!
   
   I tried running a set of benchmarks (Cohere, 768d, byte vectors) on 
`niter=100,500,1000,5000,10000,50000` (only search existing index, no reindex) 
to test how the compiler optimizes over larger runs..
   
   `main`:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  num_segments
    0.968        2.790   2.710        0.971  100000   100      50       64      
  250         no             1
    0.964        2.692   2.674        0.993  100000   100      50       64      
  250         no             1
    0.963        2.685   2.677        0.997  100000   100      50       64      
  250         no             1
    0.963        2.599   2.597        0.999  100000   100      50       64      
  250         no             1
    0.962        2.552   2.550        0.999  100000   100      50       64      
  250         no             1
    0.962        2.590   2.589        1.000  100000   100      50       64      
  250         no             1
   ```
   
   This PR:
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  num_segments
    0.968        2.050   1.960        0.956  100000   100      50       64      
  250         no             1
    0.964        1.868   1.850        0.990  100000   100      50       64      
  250         no             1
    0.963        1.894   1.885        0.995  100000   100      50       64      
  250         no             1
    0.963        1.770   1.769        0.999  100000   100      50       64      
  250         no             1
    0.962        1.762   1.761        1.000  100000   100      50       64      
  250         no             1
    0.962        1.742   1.741        1.000  100000   100      50       64      
  250         no             1
   ```
   
   There's a possibility that the candidate (i.e. this PR) has an inherent 
benefit due to being run later in time (so vectors are _more likely_ to be 
loaded into RAM) -- so I ran baseline (i.e. `main`) immediately afterwards:
   
   ```
   recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  
beamWidth  quantized  num_segments
    0.968        2.850   2.770        0.972  100000   100      50       64      
  250         no             1
    0.964        2.712   2.694        0.993  100000   100      50       64      
  250         no             1
    0.963        2.640   2.632        0.997  100000   100      50       64      
  250         no             1
    0.963        2.588   2.586        0.999  100000   100      50       64      
  250         no             1
    0.962        2.561   2.559        0.999  100000   100      50       64      
  250         no             1
    0.962        2.550   2.549        1.000  100000   100      50       64      
  250         no             1
   ```
   
   For some reason, the changes in this PR are still better on my machine :/
   
   > I think we should understand the hotspot hack a little better before we 
push that, because it's really kind of gross and feels like voodoo to me
   
   +1, not looking to merge this until we find out why we're seeing a 
difference in performance (seems counterintuitive as we're doing _more_ work 
but seeing better latency!) -- when we (1) create a fresh index, (2) reindex, 
(3) search an existing index, (4) for different parameters, (5) across 
different machines?
   
   Performance seems tied to the HotSpot compiler -- is there a way to make 
optimizations more deterministic? (or at least, explicit)
   
   On a related note, benchmark runs have been so wildly fluctuating -- I 
wonder if we should set larger defaults for reliable numbers..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

Reply via email to