benwtrent commented on PR #12962:
URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919297975
   
   
   I fixed my data and ran with 1.5M cohere:
   
   static_k is this PR
   dynamic_k is this PR + scaling the `k` explored by
   ```
   loat v = (float)Math.log(sumVectorCount / (double) leafSize);
   float filterWeightValue = 1/v;
   ```
   
   Scaling the `k` is another nice incremental change. Up and to the left is 
best. We get better recall with visiting fewer consistently.
   
   ```
   plt.plot([2304, 2485, 3189, 4028, 5610, 16165], [0.875, 0.886, 0.916, 0.938, 
0.960, 0.992], marker='o', label='baseline_single')
   plt.plot([43015, 45946, 56864, 69149, 90772, 210727], [0.980, 0.982, 0.989, 
0.992, 0.996, 1.000], marker='o', label='baseline_multi')
   plt.plot([22706, 23749, 27142, 30893, 37162, 76032], [0.959, 0.962, 0.970, 
0.976, 0.983, 0.996], marker='o', label='candidate_static_k')
   plt.plot([16099, 17183, 20788, 24809, 31334, 64921], [0.937, 0.945, 0.962, 
0.973, 0.983, 0.996], marker='o', label='candidate_dynamic_k')
   ```
   
   
![image](https://github.com/apache/lucene/assets/4357155/acb38940-717b-4a7a-a9c7-1c6564df97fa)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to