romseygeek commented on PR #16240: URL: https://github.com/apache/lucene/pull/16240#issuecomment-4727391762
Thanks for looking into this @msfroh (and apologies for merging #16222 before you had a chance to look at it properly). I think we may be mixing up two different issues here: `cost()` tells us how expensive it is to iterate a `DocIdSetIterator` in full, hence it being used as a condition when deciding to build a cache entry; the problem that #16222 is solving is different, in that without it, it is prohibitively expensive to build the Scorer in the first place - once it is built the cost of iterating is likely to be much lower. `IndexOrDocValuesQuery` has a very rough heuristic in place to choose between Points and DocValues queries for precisely this situation. I wonder if we can improve things here by: - adding an `estimateCost()` method directly to `MultiTermQuery` so different query types can be smarter about building these estimates - updating `IndexOrDocValuesQuery` to allow for different heuristics for MTQs - adding some sugar methods to the various `AutomatonQuery` subclasses analogous to the ones on `TermInSetQuery` to make them easier to use. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
