rquesada-tibco opened a new issue, #13029:
URL: https://github.com/apache/lucene/issues/13029

   ### Description
   
   We recently discovered a performance degradation on our project when going 
from Lucene 9.4 to 9.9. The cause seems to be a side effect of 
https://github.com/apache/lucene/commit/c6667e709f610669b57a6a1a6ab1cc80f4f0ebaf
 and 
https://github.com/apache/lucene/commit/3809106602a9675f4fd217b1090af4505d4ec2a7
   
   The situation is as follows: we have a `WildcardQuery` and a 
`TermInSetQuery` which are and-combined (within a `BooleanQuery`). This 
structure gets executed repeatedly, kind of like a nested loop where the 
`WildcardQuery` remains the same, but the `TermInSetQuery` keeps changes its 
terms. In the old version, this was fast because the `WildcardQuery` was cached 
within the `LRUQueryCache`. However in the new version this is no longer the 
case, so the execution time of our scenario has increased.
   
   Why our `WildcardQuery` is not cached any more? It boils down to [this 
line](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L771)
 in `LRUQueryCache`, where the cache operation won't happen if the cost 
estimation is too high:
   ```
   final ScorerSupplier supplier = in.scorerSupplier(context);
   ...
   final long cost = supplier.cost();
   ...
   // skip cache operation which would slow query down too much
   if (cost / skipCacheFactor > leadCost) {
   ...
   ```
   
   Before the upgrade to 9.9, that cost was provided by a `ConstantScoreWeight` 
returned by the old `MultiTermQueryConstantScoreWrapper` (which was returned by 
the default `RewriteMethod`), which in the end was just based on the "default" 
`Weight#scoreSupplier` 
[implementation](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/Weight.java#L147):
 basically the cost was the `scorer.iterator().cost();` and in **our case the 
`WildcardQuery` returns just one document, so cost 1**.
   
   After the upgrade, the default `RewriteMethod` has changed and now this cost 
is provided by 
`AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight#scorerSupplier` 
[here](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L263),
 and for that purpose a private [estimateCost 
method](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L295)
 was introduced, which bases the estimation on the 
[MultiTermQuery#getTermsCount](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/MultiTermQuery.java#L316)
 value. The problem is that, for our `WildcardQuery` (in fact for any sub-class 
of `AutomatonQuery`), this value is unknown, i.e. `-1`, so the `estimateCost` 
method just [return
 
s](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L309)
  `terms.getSumDocFreq()`, which is clearly an overestimation in our case, so 
it prevents the caching, and leads to a performance degradation.
   
   I understand that I can fix this situation by writing my customized 
`RewriteMethod`.
   The question is: could we improve 
`AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight#scorerSupplier#cost`
  so that, if the MTQ cannot provide a term count (`getTermsCount() == -1`) 
then we return `scorer.iterator().cost()` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to