benchaplin commented on PR #14160:
URL: https://github.com/apache/lucene/pull/14160#issuecomment-2632762867

   Baseline:
   ```
   recall  latency (ms)     nDoc  topK  fanout  maxConn  beamWidth  visited  
selectivity  correlation  filterType
    1.000         9.020  1000000   100     100       16        100    10000     
    0.01        -1.00  pre-filter
    1.000         9.140  1000000   100     100       16        100    10000     
    0.01        -0.50  pre-filter
    1.000         9.150  1000000   100     100       16        100    10000     
    0.01         0.00  pre-filter
    0.898         2.850  1000000   100     100       16        100     6189     
    0.01         0.50  pre-filter
    0.877         1.690  1000000   100     100       16        100     3543     
    0.01         1.00  pre-filter
    1.000        43.970  1000000   100     100       16        100    50000     
    0.05        -1.00  pre-filter
    0.997        43.590  1000000   100     100       16        100    49624     
    0.05        -0.50  pre-filter
    0.960        22.290  1000000   100     100       16        100    39985     
    0.05         0.00  pre-filter
    0.899         2.860  1000000   100     100       16        100     6191     
    0.05         0.50  pre-filter
    0.877         1.650  1000000   100     100       16        100     3543     
    0.05         1.00  pre-filter
    1.000        83.850  1000000   100     100       16        100   100000     
    0.10        -1.00  pre-filter
    0.936        18.450  1000000   100     100       16        100    38067     
    0.10        -0.50  pre-filter
    0.930        10.530  1000000   100     100       16        100    22475     
    0.10         0.00  pre-filter
    0.896         2.670  1000000   100     100       16        100     6037     
    0.10         0.50  pre-filter
    0.877         1.690  1000000   100     100       16        100     3543     
    0.10         1.00  pre-filter
    1.000       202.960  1000000   100     100       16        100   250000     
    0.25        -1.00  pre-filter
    0.923         7.550  1000000   100     100       16        100    16069     
    0.25        -0.50  pre-filter
    0.913         5.090  1000000   100     100       16        100    10953     
    0.25         0.00  pre-filter
    0.897         2.750  1000000   100     100       16        100     5826     
    0.25         0.50  pre-filter
    0.877         1.720  1000000   100     100       16        100     3543     
    0.25         1.00  pre-filter
    1.000       376.710  1000000   100     100       16        100   500000     
    0.50        -1.00  pre-filter
    0.904         3.560  1000000   100     100       16        100     7534     
    0.50        -0.50  pre-filter
    0.904         2.710  1000000   100     100       16        100     6135     
    0.50         0.00  pre-filter
    0.894         2.410  1000000   100     100       16        100     5324     
    0.50         0.50  pre-filter
    0.877         1.500  1000000   100     100       16        100     3543     
    0.50         1.00  pre-filter
    0.939       431.780  1000000   100     100       16        100   706964     
    0.75        -1.00  pre-filter
    0.884         2.140  1000000   100     100       16        100     4344     
    0.75        -0.50  pre-filter
    0.887         1.900  1000000   100     100       16        100     4453     
    0.75         0.00  pre-filter
    0.889         1.960  1000000   100     100       16        100     4519     
    0.75         0.50  pre-filter
    0.877         1.650  1000000   100     100       16        100     3543     
    0.75         1.00  pre-filter
    ```
    
    Candidate:
    ```
    recall  latency (ms)     nDoc  topK  fanout  maxConn  beamWidth  visited  
selectivity  correlation  filterType
    0.881         4.300  1000000   100     100       16        100     8876     
    0.01        -1.00  pre-filter
    0.501         4.130  1000000   100     100       16        100     1920     
    0.01        -0.50  pre-filter
    0.653         3.220  1000000   100     100       16        100     1321     
    0.01         0.00  pre-filter
    0.999         2.890  1000000   100     100       16        100     3239     
    0.01         0.50  pre-filter
    0.976         3.000  1000000   100     100       16        100     4199     
    0.01         1.00  pre-filter
    0.652        13.020  1000000   100     100       16        100    32881     
    0.05        -1.00  pre-filter
    0.876         3.530  1000000   100     100       16        100     2172     
    0.05        -0.50  pre-filter
    0.952         3.170  1000000   100     100       16        100     2560     
    0.05         0.00  pre-filter
    0.997         3.370  1000000   100     100       16        100     6134     
    0.05         0.50  pre-filter
    0.892         2.150  1000000   100     100       16        100     4249     
    0.05         1.00  pre-filter
    0.566        19.190  1000000   100     100       16        100    56443     
    0.10        -1.00  pre-filter
    0.961         3.710  1000000   100     100       16        100     2926     
    0.10        -0.50  pre-filter
    0.981         3.730  1000000   100     100       16        100     4261     
    0.10         0.00  pre-filter
    0.988         3.380  1000000   100     100       16        100     6463     
    0.10         0.50  pre-filter
    0.879         1.760  1000000   100     100       16        100     3851     
    0.10         1.00  pre-filter
    0.380        24.090  1000000   100     100       16        100    93297     
    0.25        -1.00  pre-filter
    0.989         4.450  1000000   100     100       16        100     6232     
    0.25        -0.50  pre-filter
    0.993         4.380  1000000   100     100       16        100     7827     
    0.25         0.00  pre-filter
    0.969         3.480  1000000   100     100       16        100     6274     
    0.25         0.50  pre-filter
    0.877         1.730  1000000   100     100       16        100     3569     
    0.25         1.00  pre-filter
    0.144        13.820  1000000   100     100       16        100    66826     
    0.50        -1.00  pre-filter
    0.950         3.420  1000000   100     100       16        100     5770     
    0.50        -0.50  pre-filter
    0.960         3.330  1000000   100     100       16        100     6392     
    0.50         0.00  pre-filter
    0.928         2.960  1000000   100     100       16        100     5328     
    0.50         0.50  pre-filter
    0.877         1.610  1000000   100     100       16        100     3534     
    0.50         1.00  pre-filter
    0.939       439.810  1000000   100     100       16        100   706964     
    0.75        -1.00  pre-filter
    0.886         2.050  1000000   100     100       16        100     4355     
    0.75        -0.50  pre-filter
    0.885         2.020  1000000   100     100       16        100     4485     
    0.75         0.00  pre-filter
    0.890         2.210  1000000   100     100       16        100     4496     
    0.75         0.50  pre-filter
    0.877         1.490  1000000   100     100       16        100     3543     
    0.75         1.00  pre-filter
    ```
    
   For me, the main story here is that the candidate's advantage weakens as the 
query becomes more positively correlated with the filter (towards "1.00" 
correlation), but never gets worse than the baseline. I think this makes sense, 
because in this case, once we're in the right small world, almost every 
neighbor will pass the filter. So "predicate subgraph traversal" = 'normal 
total traversal' and the theoretical advantage disappears.
   
   Recall is bad for -1 correlation, but (recall / visited) is the same as 
baseline. Also, I'm fairly sure how I've set up -1 correlation (the filter is 
exactly the vectors with the worst score with respect to the query) is not at 
all realistic so maybe we can think of those tests as extreme edge-case stress 
testing.
   
   I agree ~0.5 selectivity seems to be a good cutoff for the new algorithm.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to