Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

via GitHub Thu, 21 Dec 2023 11:34:48 -0800


mayya-sharipova commented on PR #12962:
URL: https://github.com/apache/lucene/pull/12962#issuecomment-1866829988


   ###  1M vectors of 100 dims
   
   k=10, fanout=90
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |               980|      2336|    0.739|  
      
   | Baseline 3 segments concurrent  |              2627|      2857|    0.772|  
   
   | Candidate1_with_min_score       |              2458|      2816|    0.766| 
   | Candidate2_with_queue           |              2477|      2865|    0.767| 
   
   
   k=100, fanout=900
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |              6722|       430|    0.921|  
           
   | Baseline 3 segments concurrent  |             17595|       438|    0.949|  
   
   | Candidate1_with_min_score       |             16386|       464|    0.947|  
   | Candidate2_with_queue           |             13483|       469|    0.940| 
   
   Candidate2_with_queue VS Baseline: 
   - ${\color{green}Recall}$ is better than single segment
   - ${\color{green} QPS}$  are 0.3-7% better than multiple segments
   
   Candidate2_with_queue VS Candidate1_with_min_score: 
   - ${\color{red}Recall}$ is slighlty worse
   - ${\color{green} QPS}$  are 1-2% better 
   
   
   ###  10M vectors of 100 dims
   
   k=10, fanout=90
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |              1081|      1798|    0.634|  
  
   | Baseline 13 segments concurrent |             11869|      1371|    0.680|  
 
   | Candidate1_with_min_score       |              7660|      1845|    0.600| 
   | Candidate2_with_queue           |              7942|      1776|    0.606| 
   
   k=100, fanout=900
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
 
   | Baseline Single segment         |              7213|        272|   0.824| 
   | Baseline 13 segments concurrent |             78069|        239|   0.894|  
   | Candidate1_with_min_score       |             50649|        301|   0.868| 
   | Candidate2_with_queue           |             33139|        361|   0.834| 
   
   k=100, fanout=9900
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
 
   | Baseline Single segment         |             55168|         36|   0.916| 
   | Baseline 13 segments concurrent |            521851|         29|   0.962| 
   | Candidate1_with_min_score       |            331252|         34|   0.954| 
   | Candidate2_with_queue           |            193082|         55|   0.938| 
   
   Candidate2_with_queue VS Baseline:
   - ${\color{black}Recall}$ is slightly worse or better than single segment
   - ${\color{green} QPS}$ are 30-52% better than multiple segments
   
   
   Candidate2_with_queue VS Candidate1_with_min_score: 
   - ${\color{red}Recall}$ is slighlty worse
   - ${\color{green} QPS}$ is 20-60% better (but is -6% worse on small k)
   
   
   ### 10M vectors of 768 dims
   
   k=10, fanout=90
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
   | Baseline Single segment         |              1095|       974|    0.542|  
 
   | Baseline 19 segments concurrent |             18091|       569|    0.541|  
 
   | Candidate1_with_min_score       |             10700|       824|    0.474| 
   | Candidate2_with_queue           |             11007|       783|    0.479|
   
   
   k=100, fanout=900
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
 
   | Baseline Single segment         |              7118|       152|    0.688|  
   | Baseline 19 segments concurrent |            120453|        88|    0.732| 
   | Candidate1_with_min_score       |             65778|       139|    0.693| 
   | Candidate2_with_queue           |             44775|       183|    0.650| 
   
   
   k=100, fanout=9900
   
   |                                 |Avg visited nodes |   QPS    |   Recall| 
   |  :---                           |    ---:          |     ---: |    ---: |  
 
   | Baseline Single segment         |             59308|        19|    0.797|  
   | Baseline 19 segments concurrent |            818268|        11|    0.847|  
   | Candidate1_with_min_score       |            463464|        18|    0.828| 
   | Candidate2_with_queue           |            251612|        33|    0.779| 
   
   Candidate2_with_queue VS Baseline:
   - ${\color{red}Recall}$ is slightly worse than than single segment
   - ${\color{green} QPS}$ are 38-300% better than  multiple segments
   
   Candidate2_with_queue VS Candidate1_with_min_score: 
   - ${\color{red}Recall}$ is slighlty worse (but better on small k)
   - ${\color{green} QPS}$ is 31-83% better (but is 5% worse on small k)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

Reply via email to