benwtrent commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919297975
I fixed my data and ran with 1.5M cohere: static_k is this PR dynamic_k is this PR + scaling the `k` explored by ``` loat v = (float)Math.log(sumVectorCount / (double) leafSize); float filterWeightValue = 1/v; ``` Scaling the `k` is another nice incremental change. Up and to the left is best. We get better recall with visiting fewer consistently. ``` plt.plot([2304, 2485, 3189, 4028, 5610, 16165], [0.875, 0.886, 0.916, 0.938, 0.960, 0.992], marker='o', label='baseline_single') plt.plot([43015, 45946, 56864, 69149, 90772, 210727], [0.980, 0.982, 0.989, 0.992, 0.996, 1.000], marker='o', label='baseline_multi') plt.plot([22706, 23749, 27142, 30893, 37162, 76032], [0.959, 0.962, 0.970, 0.976, 0.983, 0.996], marker='o', label='candidate_static_k') plt.plot([16099, 17183, 20788, 24809, 31334, 64921], [0.937, 0.945, 0.962, 0.973, 0.983, 0.996], marker='o', label='candidate_dynamic_k') ```  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org