msfroh commented on PR #16240: URL: https://github.com/apache/lucene/pull/16240#issuecomment-4733772590
> it is prohibitively expensive to build the Scorer in the first place - once it is built the cost of iterating is likely to be much lower. Right -- a challenge is that some of these Scorers are actually very cheap to build, but we don't have a good way of knowing that without trying. So, we assume the cost of the Scorer is high and make bad decisions as a consequence (either going for DV-based filtering when the MTQ actually matches a small number of docs, or opting to skip caching of the MTQ and any BooleanQuery that contains it). @romseygeek, I really like your suggestion of moving `estimateCost` into `MultiTermQuery`. I think we need more polymorphism here, since shoving all the logic into `AbstractMultiTermQueryConstantScoreWrapper` is already getting messy. In particular, the logic should be a lot simpler for some queries (e.g. `PrefixQuery`) than others (e.g. `RegexpQuery`). It'll probably take me a few days to implement that, but I don't want to hold up the 10.5 release. IMO, the improvement that @txwei made is probably more significant than the hit that we take from not caching "easy" clauses. In 10.6, we can have both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
