Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

via GitHub Fri, 31 Jan 2025 04:50:34 -0800


benwtrent commented on PR #14160:
URL: https://github.com/apache/lucene/pull/14160#issuecomment-2627228519


   I did some more testing, this time single segment of our nightly runs. The 
recall & latency pattern is much healthier with this change, though the recall 
is lower. The only reason the recall is so high for the restricted filters is 
that the baseline over-eagerly drops to brute-force because it spends way too 
much time doing vector comparisons. 
   
   <img width="594" alt="image" 
src="https://github.com/user-attachments/assets/5f6d5f4f-d053-4e50-9858-fb1e4d8f2023";
 />
   
   
   BASELINE
   
   ```
   recall  latency (ms)     nDoc  topK  fanout  visited  selectivity
    1.000       131.763  8000000   100      50    79814        0.010
    0.924        50.518  8000000   100      50    53003        0.050
    0.912        18.970  8000000   100      50    30095        0.100
    0.896        10.697  8000000   100      50    16942        0.200
    0.884         7.509  8000000   100      50    12057        0.300
    0.876         5.763  8000000   100      50     9476        0.400
    0.869         4.792  8000000   100      50     7905        0.500
    0.863         4.184  8000000   100      50     6777        0.600
    0.858         3.781  8000000   100      50     5966        0.700
    0.853         3.403  8000000   100      50     5351        0.800
    0.850         3.084  8000000   100      50     4855        0.900
    0.849         3.044  8000000   100      50     4645        0.950
    0.848         2.927  8000000   100      50     4492        0.990
   ```
   
   Candidate:
   
   ```
   recall  latency (ms)     nDoc  topK  fanout  visited  selectivity
    0.481         4.976  8000000   100      50     2162        0.010
    0.714         7.366  8000000   100      50     4141        0.050
    0.789         8.558  8000000   100      50     7222        0.100
    0.816         9.448  8000000   100      50    10318        0.200
    0.803         7.908  8000000   100      50    10281        0.300
    0.796         7.088  8000000   100      50     9406        0.400
    0.767         4.415  8000000   100      50     5909        0.500
    0.791         4.280  8000000   100      50     5838        0.600
    0.807         3.892  8000000   100      50     5677        0.700
    0.820         3.708  8000000   100      50     5291        0.800
    0.833         3.088  8000000   100      50     4481        0.900
    0.840         2.902  8000000   100      50     4308        0.950
    0.846         2.959  8000000   100      50     4418        0.990
   ```
   
   ```
   recall  latency (ms)     nDoc  topK  fanout  visited  selectivity
    0.714         7.722  8000000   100      50     4141        0.050
    0.721         7.813  8000000   100      60     4329        0.050
    0.728         8.158  8000000   100      70     4515        0.050
    0.734         8.656  8000000   100      80     4701        0.050
    0.741         8.719  8000000   100      90     4885        0.050
    0.746         6.566  8000000   100     100     5063        0.050
    0.751         6.493  8000000   100     110     5239        0.050
    0.756         6.913  8000000   100     120     5416        0.050
    0.761         7.186  8000000   100     130     5585        0.050
    0.765         7.595  8000000   100     140     5756        0.050
    0.769         7.150  8000000   100     150     5923        0.050
    0.773         8.241  8000000   100     160     6093        0.050
    0.777         8.056  8000000   100     170     6255        0.050
    0.780         8.386  8000000   100     180     6424        0.050
    0.784         8.963  8000000   100     190     6584        0.050
    0.787         8.786  8000000   100     200     6743        0.050
    0.790         9.459  8000000   100     210     6902        0.050
    0.793         8.709  8000000   100     220     7058        0.050
    0.796         8.881  8000000   100     230     7213        0.050
    0.798         9.612  8000000   100     240     7367        0.050
    0.801         9.527  8000000   100     250     7519        0.050
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

Reply via email to