dungba88 commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2676175583

   I ran some benchmark with Cohere 768 dataset for 3 different algorithms: (1) 
the baseline "greedy", (2) this PR "optimistic", and (3) with only "pro-rata". 
(2) and (3) will converge with fanout >= 20, it kinda makes sense because as we 
increase the `ef` parameters, it would already look deeper into the graph. (2) 
increase ~6% recall compared to (3) at fanout=0, and the effect got diminished 
the higher fanout, such as ~3% at fanout=5 and 1% at fanout=10.
   
   (2) still missed some matches (about 8%) when compared to (1), I'm wondering 
why is that, as we tried to exhaustively search in segments.
   
   Other params: ndoc=500K, topK=100
   
   ```
   recall  latency_(ms)  fanout  quantized  greediness  index_s  index_docs/s  
force_merge_s  num_segments  index_size_(MB)  vec_disk_(MB)  vec_RAM_(MB)   
algorithm 
    0.936        13.285       0         no        0.00   117.24       4264.94   
        0.00             7          1490.05       1464.844      1464.844  
greediness 
    0.933        12.766       0         no        0.10   116.45       4293.84   
        0.00             7          1490.14       1464.844      1464.844  
greediness 
    0.927        12.045       0         no        0.30   116.69       4285.04   
        0.00             7          1490.12       1464.844      1464.844  
greediness 
    0.923        11.709       0         no        0.50   117.74       4246.79   
        0.00             7          1490.06       1464.844      1464.844  
greediness 
    0.917        10.950       0         no        0.70   117.68       4248.67   
        0.00             7          1490.06       1464.844      1464.844  
greediness 
    0.907        10.996       0         no        0.90   117.40       4258.80   
        0.00             7          1489.98       1464.844      1464.844  
greediness 
    0.836        10.349       0         no        1.00   117.07       4271.13   
        0.00             7          1490.03       1464.844      1464.844  
greediness 
    0.846         6.682       0         no       -1.00   116.50       4291.77   
        0.00             7          1490.09       1464.844      1464.844  
optimistic
    0.862         6.699       5         no       -1.00   118.33       4225.44   
        0.00             7          1489.97       1464.844      1464.844  
optimistic
    0.871         7.172      10         no       -1.00   116.10       4306.63   
        0.00             7          1490.05       1464.844      1464.844  
optimistic
    0.897         7.962      20         no       -1.00   116.11       4306.15   
        0.00             7          1490.06       1464.844      1464.844  
optimistic
    0.925        10.975      50         no       -1.00   116.71       4284.23   
        0.00             7          1490.10       1464.844      1464.844  
optimistic
    0.941        15.635     100         no       -1.00   116.58       4289.08   
        0.00             7          1490.13       1464.844      1464.844  
optimistic
    0.786         5.727       0         no       -1.00   117.86       4242.25   
        0.00             7          1490.10       1464.844      1464.844    
prorata
    0.831         6.176       5         no       -1.00   117.27       4263.70   
        0.00             7          1490.07       1464.844      1464.844    
prorata
    0.857         6.746      10         no       -1.00   116.79       4281.04   
        0.00             7          1490.08       1464.844      1464.844    
prorata
    0.891         7.763      20         no       -1.00   116.21       4302.59   
        0.00             7          1490.09       1464.844      1464.844    
prorata
    0.921        11.083      50         no       -1.00   118.02       4236.46   
        0.00             7          1490.06       1464.844      1464.844    
prorata
    0.940        15.694     100         no       -1.00   116.63       4286.91   
        0.00             7          1490.00       1464.844      1464.844    
prorata
   ```
   
   <img width="713" alt="Screenshot 2025-02-22 at 21 17 40" 
src="https://github.com/user-attachments/assets/51daedd7-53f8-4222-9c3a-88c7e5e4a733";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to