vigyasharma commented on PR #14708:
URL: https://github.com/apache/lucene/pull/14708#issuecomment-3168944971

   I added some benchmarks to measure NDCG for knn search with 4 bit quantized 
vectors, reranked with full precision vector similarity scores here - 
https://github.com/mikemccand/luceneutil/pull/435
   
   Pasting results below as well.
   
   +++
   
   #### Benchmark Results
   I see an `8% - 14%` improvement in NDCG@10 and a `6% - 8%` improvement in 
NDCG@K for 4 bit quantized knn search with full precision reranking. 
Improvement increases with index size. Latency impact doesn't seem significant?
   
   ```ruby
   recall  ndcg@10  ndcg@K  rerank  latency(ms)  netCPU  avgCpuCount      nDoc  
topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  
num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.524    0.920   0.600   false        2.235   2.216        0.991    100000  
 100      20       32         50     4 bits     16.96       5896.23             
4          333.95       329.971       37.003       HNSW
    0.524    0.999   0.637    true        2.279   2.261        0.992    100000  
 100      20       32         50     4 bits      0.00      Infinity             
4          333.95       329.971       37.003       HNSW
    0.490    0.901   0.568   false        3.199   3.183        0.995    500000  
 100      20       32         50     4 bits     70.18       7124.23             
3         1670.44      1649.857      185.013       HNSW
    0.490    0.999   0.607    true        3.201   3.181        0.994    500000  
 100      20       32         50     4 bits      0.00      Infinity             
3         1670.44      1649.857      185.013       HNSW
    0.480    0.891   0.557   false        4.774   4.756        0.996   1000000  
 100      20       32         50     4 bits    134.12       7456.07             
6         3341.86      3299.713      370.026       HNSW
    0.480    0.999   0.598    true        4.697   4.675        0.995   1000000  
 100      20       32         50     4 bits      0.00      Infinity             
6         3341.86      3299.713      370.026       HNSW
    0.462    0.883   0.541   false        5.081   5.035        0.991   2000000  
 100      20       32         50     4 bits    465.46       4296.82             
7         6688.32      6599.426      740.051       HNSW
    0.462    0.998   0.583    true        4.885   4.863        0.995   2000000  
 100      20       32         50     4 bits      0.00      Infinity             
7         6688.32      6599.426      740.051       HNSW
    0.447    0.871   0.526   false       11.633  11.495        0.988  10000000  
 100      20       32         50     4 bits   2110.54       4738.12            
14        33489.11     32997.131     3700.256       HNSW
    0.447    0.998   0.569    true       11.037  10.984        0.995  10000000  
 100      20       32         50     4 bits      0.00      Infinity            
14        33489.11     32997.131     3700.256       HNSW
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to