msfroh commented on issue #15097:
URL: https://github.com/apache/lucene/issues/15097#issuecomment-3207658474

   I'm kind of surprised that we don't use `RamUsageEstimator.sizeOf(Query)` to 
estimate the size of the complex queries. That would at least use the 
`RamUsageQueryVisitor` on a `BooleanQuery` to visit each of its sub-clauses.
   
   I looked into the history of this a bit in 
http://issues.apache.org/jira/browse/LUCENE-8855. It looks like there was some 
debate over whether it makes sense to provide accurate estimates of size or if 
it makes more sense to just disable caching for things that are potentially 
large. I get the impression that the solution was kind of a middle ground, 
where `BooleanQuery` was supposed to go with the latter -- that is, its 
`isCacheable` method returns false if there are more than 16 clauses (or if any 
of those clauses are not cacheable, including if those clauses have more than 
16 subclauses).
   
   So, now, I'm wondering how we end up with `BooleanQuery` instances averaging 
3.2MB. The immediate worst-case that I can think of would be a case where you 
have a tree where each node has exactly 16 children. Each element would be 
cacheable, including the root.
   
   I'm wondering if there could be a simple fix for 
`BooleanQuery#isCacheable(LeafReaderContext)` where we use a visitor to visit 
children, where that visitor early-terminates (by throwing an exception?) once 
it collects too many clauses? It might be worth opening a Lucene issue to 
discuss options there.
   
   Another option (that might help) would be @atris's change to flatten 
`BooleanQuery` where possible 
(https://github.com/opensearch-project/OpenSearch/pull/19060). Of course, you 
can still come up with non-flattenable `BoolenQuery` examples where you 
interleave `MUST` clauses at even levels with `SHOULD` clauses at odd levels 
(or vice versa), so it's not really flattenable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to