msfroh commented on issue #15097: URL: https://github.com/apache/lucene/issues/15097#issuecomment-3207658474
I'm kind of surprised that we don't use `RamUsageEstimator.sizeOf(Query)` to estimate the size of the complex queries. That would at least use the `RamUsageQueryVisitor` on a `BooleanQuery` to visit each of its sub-clauses. I looked into the history of this a bit in http://issues.apache.org/jira/browse/LUCENE-8855. It looks like there was some debate over whether it makes sense to provide accurate estimates of size or if it makes more sense to just disable caching for things that are potentially large. I get the impression that the solution was kind of a middle ground, where `BooleanQuery` was supposed to go with the latter -- that is, its `isCacheable` method returns false if there are more than 16 clauses (or if any of those clauses are not cacheable, including if those clauses have more than 16 subclauses). So, now, I'm wondering how we end up with `BooleanQuery` instances averaging 3.2MB. The immediate worst-case that I can think of would be a case where you have a tree where each node has exactly 16 children. Each element would be cacheable, including the root. I'm wondering if there could be a simple fix for `BooleanQuery#isCacheable(LeafReaderContext)` where we use a visitor to visit children, where that visitor early-terminates (by throwing an exception?) once it collects too many clauses? It might be worth opening a Lucene issue to discuss options there. Another option (that might help) would be @atris's change to flatten `BooleanQuery` where possible (https://github.com/opensearch-project/OpenSearch/pull/19060). Of course, you can still come up with non-flattenable `BoolenQuery` examples where you interleave `MUST` clauses at even levels with `SHOULD` clauses at odd levels (or vice versa), so it's not really flattenable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org