benchaplin commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2632762867
Baseline: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth visited selectivity correlation filterType 1.000 9.020 1000000 100 100 16 100 10000 0.01 -1.00 pre-filter 1.000 9.140 1000000 100 100 16 100 10000 0.01 -0.50 pre-filter 1.000 9.150 1000000 100 100 16 100 10000 0.01 0.00 pre-filter 0.898 2.850 1000000 100 100 16 100 6189 0.01 0.50 pre-filter 0.877 1.690 1000000 100 100 16 100 3543 0.01 1.00 pre-filter 1.000 43.970 1000000 100 100 16 100 50000 0.05 -1.00 pre-filter 0.997 43.590 1000000 100 100 16 100 49624 0.05 -0.50 pre-filter 0.960 22.290 1000000 100 100 16 100 39985 0.05 0.00 pre-filter 0.899 2.860 1000000 100 100 16 100 6191 0.05 0.50 pre-filter 0.877 1.650 1000000 100 100 16 100 3543 0.05 1.00 pre-filter 1.000 83.850 1000000 100 100 16 100 100000 0.10 -1.00 pre-filter 0.936 18.450 1000000 100 100 16 100 38067 0.10 -0.50 pre-filter 0.930 10.530 1000000 100 100 16 100 22475 0.10 0.00 pre-filter 0.896 2.670 1000000 100 100 16 100 6037 0.10 0.50 pre-filter 0.877 1.690 1000000 100 100 16 100 3543 0.10 1.00 pre-filter 1.000 202.960 1000000 100 100 16 100 250000 0.25 -1.00 pre-filter 0.923 7.550 1000000 100 100 16 100 16069 0.25 -0.50 pre-filter 0.913 5.090 1000000 100 100 16 100 10953 0.25 0.00 pre-filter 0.897 2.750 1000000 100 100 16 100 5826 0.25 0.50 pre-filter 0.877 1.720 1000000 100 100 16 100 3543 0.25 1.00 pre-filter 1.000 376.710 1000000 100 100 16 100 500000 0.50 -1.00 pre-filter 0.904 3.560 1000000 100 100 16 100 7534 0.50 -0.50 pre-filter 0.904 2.710 1000000 100 100 16 100 6135 0.50 0.00 pre-filter 0.894 2.410 1000000 100 100 16 100 5324 0.50 0.50 pre-filter 0.877 1.500 1000000 100 100 16 100 3543 0.50 1.00 pre-filter 0.939 431.780 1000000 100 100 16 100 706964 0.75 -1.00 pre-filter 0.884 2.140 1000000 100 100 16 100 4344 0.75 -0.50 pre-filter 0.887 1.900 1000000 100 100 16 100 4453 0.75 0.00 pre-filter 0.889 1.960 1000000 100 100 16 100 4519 0.75 0.50 pre-filter 0.877 1.650 1000000 100 100 16 100 3543 0.75 1.00 pre-filter ``` Candidate: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth visited selectivity correlation filterType 0.881 4.300 1000000 100 100 16 100 8876 0.01 -1.00 pre-filter 0.501 4.130 1000000 100 100 16 100 1920 0.01 -0.50 pre-filter 0.653 3.220 1000000 100 100 16 100 1321 0.01 0.00 pre-filter 0.999 2.890 1000000 100 100 16 100 3239 0.01 0.50 pre-filter 0.976 3.000 1000000 100 100 16 100 4199 0.01 1.00 pre-filter 0.652 13.020 1000000 100 100 16 100 32881 0.05 -1.00 pre-filter 0.876 3.530 1000000 100 100 16 100 2172 0.05 -0.50 pre-filter 0.952 3.170 1000000 100 100 16 100 2560 0.05 0.00 pre-filter 0.997 3.370 1000000 100 100 16 100 6134 0.05 0.50 pre-filter 0.892 2.150 1000000 100 100 16 100 4249 0.05 1.00 pre-filter 0.566 19.190 1000000 100 100 16 100 56443 0.10 -1.00 pre-filter 0.961 3.710 1000000 100 100 16 100 2926 0.10 -0.50 pre-filter 0.981 3.730 1000000 100 100 16 100 4261 0.10 0.00 pre-filter 0.988 3.380 1000000 100 100 16 100 6463 0.10 0.50 pre-filter 0.879 1.760 1000000 100 100 16 100 3851 0.10 1.00 pre-filter 0.380 24.090 1000000 100 100 16 100 93297 0.25 -1.00 pre-filter 0.989 4.450 1000000 100 100 16 100 6232 0.25 -0.50 pre-filter 0.993 4.380 1000000 100 100 16 100 7827 0.25 0.00 pre-filter 0.969 3.480 1000000 100 100 16 100 6274 0.25 0.50 pre-filter 0.877 1.730 1000000 100 100 16 100 3569 0.25 1.00 pre-filter 0.144 13.820 1000000 100 100 16 100 66826 0.50 -1.00 pre-filter 0.950 3.420 1000000 100 100 16 100 5770 0.50 -0.50 pre-filter 0.960 3.330 1000000 100 100 16 100 6392 0.50 0.00 pre-filter 0.928 2.960 1000000 100 100 16 100 5328 0.50 0.50 pre-filter 0.877 1.610 1000000 100 100 16 100 3534 0.50 1.00 pre-filter 0.939 439.810 1000000 100 100 16 100 706964 0.75 -1.00 pre-filter 0.886 2.050 1000000 100 100 16 100 4355 0.75 -0.50 pre-filter 0.885 2.020 1000000 100 100 16 100 4485 0.75 0.00 pre-filter 0.890 2.210 1000000 100 100 16 100 4496 0.75 0.50 pre-filter 0.877 1.490 1000000 100 100 16 100 3543 0.75 1.00 pre-filter ``` For me, the main story here is that the candidate's advantage weakens as the query becomes more positively correlated with the filter (towards "1.00" correlation), but never gets worse than the baseline. I think this makes sense, because in this case, once we're in the right small world, almost every neighbor will pass the filter. So "predicate subgraph traversal" = 'normal total traversal' and the theoretical advantage disappears. Recall is bad for -1 correlation, but (recall / visited) is the same as baseline. Also, I'm fairly sure how I've set up -1 correlation (the filter is exactly the vectors with the worst score with respect to the query) is not at all realistic so maybe we can think of those tests as extreme edge-case stress testing. I agree ~0.5 selectivity seems to be a good cutoff for the new algorithm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org