Re: [PR] Limit term dictionary traversal in multi-term queries [lucene]

via GitHub Wed, 17 Jun 2026 00:50:26 -0700


romseygeek commented on PR #16240:
URL: https://github.com/apache/lucene/pull/16240#issuecomment-4727391762


   Thanks for looking into this @msfroh (and apologies for merging #16222 
before you had a chance to look at it properly).
   
   I think we may be mixing up two different issues here: `cost()` tells us how 
expensive it is to iterate a `DocIdSetIterator` in full, hence it being used as 
a condition when deciding to build a cache entry; the problem that #16222 is 
solving is different, in that without it, it is prohibitively expensive to 
build the Scorer in the first place - once it is built the cost of iterating is 
likely to be much lower.  `IndexOrDocValuesQuery` has a very rough heuristic in 
place to choose between Points and DocValues queries for precisely this 
situation.
   
   I wonder if we can improve things here by:
   - adding an `estimateCost()` method directly to `MultiTermQuery` so 
different query types can be smarter about building these estimates
   - updating `IndexOrDocValuesQuery` to allow for different heuristics for MTQs
   - adding some sugar methods to the various `AutomatonQuery` subclasses 
analogous to the ones on `TermInSetQuery` to make them easier to use.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Limit term dictionary traversal in multi-term queries [lucene]

Reply via email to