rquesada-tibco opened a new issue, #13029: URL: https://github.com/apache/lucene/issues/13029
### Description We recently discovered a performance degradation on our project when going from Lucene 9.4 to 9.9. The cause seems to be a side effect of https://github.com/apache/lucene/commit/c6667e709f610669b57a6a1a6ab1cc80f4f0ebaf and https://github.com/apache/lucene/commit/3809106602a9675f4fd217b1090af4505d4ec2a7 The situation is as follows: we have a `WildcardQuery` and a `TermInSetQuery` which are and-combined (within a `BooleanQuery`). This structure gets executed repeatedly, kind of like a nested loop where the `WildcardQuery` remains the same, but the `TermInSetQuery` keeps changes its terms. In the old version, this was fast because the `WildcardQuery` was cached within the `LRUQueryCache`. However in the new version this is no longer the case, so the execution time of our scenario has increased. Why our `WildcardQuery` is not cached any more? It boils down to [this line](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L771) in `LRUQueryCache`, where the cache operation won't happen if the cost estimation is too high: ``` final ScorerSupplier supplier = in.scorerSupplier(context); ... final long cost = supplier.cost(); ... // skip cache operation which would slow query down too much if (cost / skipCacheFactor > leadCost) { ... ``` Before the upgrade to 9.9, that cost was provided by a `ConstantScoreWeight` returned by the old `MultiTermQueryConstantScoreWrapper` (which was returned by the default `RewriteMethod`), which in the end was just based on the "default" `Weight#scoreSupplier` [implementation](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/Weight.java#L147): basically the cost was the `scorer.iterator().cost();` and in **our case the `WildcardQuery` returns just one document, so cost 1**. After the upgrade, the default `RewriteMethod` has changed and now this cost is provided by `AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight#scorerSupplier` [here](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L263), and for that purpose a private [estimateCost method](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L295) was introduced, which bases the estimation on the [MultiTermQuery#getTermsCount](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/MultiTermQuery.java#L316) value. The problem is that, for our `WildcardQuery` (in fact for any sub-class of `AutomatonQuery`), this value is unknown, i.e. `-1`, so the `estimateCost` method just [return s](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L309) `terms.getSumDocFreq()`, which is clearly an overestimation in our case, so it prevents the caching, and leads to a performance degradation. I understand that I can fix this situation by writing my customized `RewriteMethod`. The question is: could we improve `AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight#scorerSupplier#cost` so that, if the MTQ cannot provide a term count (`getTermsCount() == -1`) then we return `scorer.iterator().cost()` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org