Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

via GitHub Wed, 31 Jan 2024 10:32:49 -0800


mayya-sharipova commented on PR #12962:
URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919701631


   I've re-ran the sets o with latest changes on this PR (candidate) and main 
branch (baseline):
   
   I have also done experiments using Cohere dataset, as as seen below:
   - for 10M docs dataset, the speedups with the proposed approach are 1.7-2.5x 
times.
   - for 10M docs dataset, where k+fanout = 1000, QPS is close to the QPS of a 
single segment, while recall is better.
   
   ## Cohere/wikipedia-22-12-en-embeddings
   
   - 
[Cohere/wikipedia-22-12-en-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings)
 dataset
   - 768 dims
   - interval to synchronize with global queue: 255 visited docs
   
   ### 1M vectors 
   k=10, fanout=90
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |             |     |    0.880|   
   | Baseline 8 segments concurrent  |             13927|       815|    0.974|  
 
   | Candidate2_with_queue           |             12670|       859|   0.964|
   
   k=100, fanout=900
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |            |      |    0.964|   
   | Baseline 8 segments concurrent  |             81824|       126|    0.997|  
 
   | Candidate2_with_queue           |             62085|       165|    0.995|
   
   ### 10M vectors 
   k=10, fanout=90
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |             |      |    0.929|   
   | Baseline 19 segments concurrent |            37656|      271|    0.951|   
   | Candidate2_with_queue           |             21921|       443|    0.927|
   
   
   k=100, fanout=900
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |                 |         |        0.950 
|   
   | Baseline 19 segments concurrent |            229945|        44|    0.990|  
 
   | Candidate2_with_queue           |             101970|       91|    0.984|
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

Reply via email to