mayya-sharipova commented on PR #12962:
URL: https://github.com/apache/lucene/pull/12962#issuecomment-1889934399

   I have also done experiments using Cohere dataset, as as seen below for 10M 
docs dataset, the speedups with the proposed approach are 1.7-2.5x times.
   
   ## Cohere/wikipedia-22-12-en-embeddings
   
   - 
[Cohere/wikipedia-22-12-en-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings)
 dataset
   - 768 dims
   
   ### 1M vectors 
   k=10, fanout=90
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |               804|      3225|    0.454|  
 
   | Baseline 8 segments concurrent  |              1807|      1831|    0.887|  
 
   | Candidate2_with_queue           |              1807|      1872|    0.887|
   
   k=100, fanout=900
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |              4555|       527|    0.477|  
 
   | Baseline 8 segments concurrent  |              9119|       261|    0.923|  
 
   | Candidate2_with_queue           |              9119|       265|    0.923|
   
   ### 10M vectors 
   k=10, fanout=90
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |                  |          |         |  
 
   | Baseline 19 segments concurrent |             37726|       293|    0.971|  
 
   | Candidate2_with_queue           |             20199|       501|    0.960|
   
   
   k=100, fanout=900
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |                  |          |         |  
 
   | Baseline 19 segments concurrent |            234047|        47|    0.992|  
 
   | Candidate2_with_queue           |             74995|       118|    0.979|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to