shubhamvishu commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-3048263308
Hi, we see some good performance gains with this PR as we see above in the conversations. I ran the `luceneutil` benchmarks with this PR to measure the overall impact and below are the benchmark results on Graviton2 machine I used : #### Setup : - **Baseline** : current Apache lucene **main** - **Candidate** : current Apache lucene **main** + [PR#13572](https://github.com/apache/lucene/pull/13572) - Dataset : Cohere (768 dim) #### Summary : - We observe `~42-47%` reduction in latency and `~50%` reduction in CPUTime (*though little slow indexing?, maybe its not apple-apple comparison or we are missing something in squeezing the complete SIMD gains*) #### Baseline (with `-reindex` and `-numSearchThread` as 1) : | recall | latency(ms) | netCPU | avgCpuCount | nDoc | topK | fanout | maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType | |--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------| | 0.873 | 10.045 | 10.024 | 0.998 | 500000 | 100 | 50 | 64 | 250 | 7 bits | 641.24 | 779.74 | 3 | 1870.95 | 1832.962 | 368.118 | HNSW | #### Candidate (with baseline index and `-numSearchThread` as 1) : | recall | latency(ms) | netCPU | avgCpuCount | nDoc | topK | fanout | maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType | |--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------| | 0.873 | 5.323 | 5.159 | 0.969 | 500000 | 100 | 50 | 64 | 250 | 7 bits | 0.00 | Infinity | 3 | 1870.95 | 1832.962 | 368.118 | HNSW | #### Baseline (with `-reindex` and `-numSearchThread` as CPU cores) : | recall | latency(ms) | netCPU | avgCpuCount | nDoc | topK | fanout | maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType | |--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------| | 0.875 | 5.696 | 10.052 | 1.765 | 500000 | 100 | 50 | 64 | 250 | 7 bits | 633.99 | 788.66 | 3 | 1871.02 | 1832.962 | 368.118 | HNSW | #### Candidate (with baseline index and `-numSearchThread` as CPU cores) : | recall | latency(ms) | netCPU | avgCpuCount | nDoc | topK | fanout | maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType | |--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------| | 0.875 | 3.262 | 5.242 | 1.607 | 500000 | 100 | 50 | 64 | 250 | 7 bits | 0.00 | Infinity | 3 | 1871.02 | 1832.962 | 368.118 | HNSW | ----------------------------------------------- #### Some other runs : #### Candidate (with `-reindex` and `-numSearchThread` as 1) : | recall | latency(ms) | netCPU | avgCpuCount | nDoc | topK | fanout | maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType | |--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------| | 0.870 | 4.348 | 4.208 | 0.968 | 500000 | 100 | 50 | 64 | 250 | 7 bits | 808.15 | 618.70 | 2 | 1871.61 | 1832.962 | 368.118 | HNSW | #### Candidate (with `-reindex` and `-numSearchThread` as CPU cores) : | recall | latency(ms) | netCPU | avgCpuCount | nDoc | topK | fanout | maxConn | beamWidth | quantized | index(s) | index_docs/s | num_segments | index_size(MB) | vec_disk(MB) | vec_RAM(MB) | indexType | |--------|-------------|--------|--------------|--------:|-----:|--------:|---------:|-----------:|-----------|----------:|---------------:|--------------:|----------------:|---------------:|--------------:|-----------| | 0.872 | 3.355 | 4.434 | 1.322 | 500000 | 100 | 50 | 64 | 250 | 7 bits | 802.49 | 623.06 | 2 | 1872.37 | 1832.962 | 368.118 | HNSW | I'm curious if there is anyway forward with this change here in Lucene, obviously not in the core but are we ok or not ok with having it in `misc` module? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org