Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

via GitHub Tue, 08 Jul 2025 03:13:33 -0700


shubhamvishu commented on PR #13572:
URL: https://github.com/apache/lucene/pull/13572#issuecomment-3048263308


   Hi, we see some good performance gains with this PR as we see above in the 
conversations. I ran the `luceneutil` benchmarks with this PR to measure the 
overall impact and below are the benchmark results on Graviton2 machine I used :
   
   #### Setup :
   
   - **Baseline** : current Apache lucene **main**
   - **Candidate** : current Apache lucene **main** + 
[PR#13572](https://github.com/apache/lucene/pull/13572)
   - Dataset : Cohere (768 dim)
   
   
   #### Summary :
   
   - We observe `~42-47%` reduction in latency and `~50%` reduction in CPUTime 
(*though little slow indexing?, maybe its not apple-apple comparison or we are 
missing something in squeezing the complete SIMD gains*) 
   
   
   #### Baseline (with `-reindex` and `-numSearchThread` as 1) :
   
   | recall | latency(ms) | netCPU | avgCpuCount |  nDoc   | topK | fanout | 
maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | 
index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType |
   
|--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------|
   | 0.873  | 10.045      | 10.024 | 0.998        | 500000 |  100 |     50 |    
   64 |        250 | 7 bits    |   641.24 |         779.74 |             3 |    
     1870.95 |       1832.962 |       368.118 | HNSW      |
   
   &nbsp;
   #### Candidate (with baseline index and `-numSearchThread` as 1) :
   
   | recall | latency(ms) | netCPU | avgCpuCount |  nDoc   | topK | fanout | 
maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | 
index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType |
   
|--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------|
   | 0.873  | 5.323       | 5.159  | 0.969        | 500000 |  100 |     50 |    
   64 |        250 | 7 bits    |    0.00   |       Infinity |             3 |   
      1870.95 |       1832.962 |       368.118 | HNSW      |
   
   &nbsp;
   #### Baseline (with `-reindex` and `-numSearchThread` as CPU cores) :
   
   | recall | latency(ms) | netCPU | avgCpuCount |  nDoc   | topK | fanout | 
maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | 
index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType |
   
|--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------|
   | 0.875  | 5.696       | 10.052 | 1.765        | 500000 |  100 |     50 |    
   64 |        250 | 7 bits    |   633.99 |         788.66 |             3 |    
     1871.02 |       1832.962 |       368.118 | HNSW      |
   
   &nbsp;
   #### Candidate (with baseline index and `-numSearchThread` as CPU cores) :
   
   | recall | latency(ms) | netCPU | avgCpuCount |  nDoc   | topK | fanout | 
maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | 
index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType |
   
|--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------|
   | 0.875  | 3.262       | 5.242  | 1.607        | 500000 |  100 |     50 |    
   64 |        250 | 7 bits    |    0.00   |       Infinity |             3 |   
      1871.02 |       1832.962 |       368.118 | HNSW      |
   
   &nbsp;
   -----------------------------------------------
   #### Some other runs :
   
   &nbsp;
   #### Candidate (with `-reindex` and `-numSearchThread` as 1) :
   
   | recall | latency(ms) | netCPU | avgCpuCount |  nDoc   | topK | fanout | 
maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | 
index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType |
   
|--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------|
   | 0.870  | 4.348       | 4.208  | 0.968        | 500000 |  100 |     50 |    
   64 |        250 | 7 bits    |   808.15 |         618.70 |             2 |    
     1871.61 |       1832.962 |       368.118 | HNSW      |
   
   &nbsp;
   #### Candidate (with `-reindex` and `-numSearchThread` as CPU cores) :
   
   | recall | latency(ms) | netCPU | avgCpuCount |  nDoc   | topK | fanout | 
maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | 
index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType |
   
|--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------|
   | 0.872  | 3.355       | 4.434  | 1.322        | 500000 |  100 |     50 |    
   64 |        250 | 7 bits    |   802.49 |         623.06 |             2 |    
     1872.37 |       1832.962 |       368.118 | HNSW      |
   
   
   I'm curious if there is anyway forward with this change here in Lucene, 
obviously not in the core but are we ok or not ok with having it in `misc` 
module? Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

Reply via email to