zhongyujiang commented on issue #10029: URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2020699122
>~~If we use ParquetCombinedRowGroupFilter, for certain expressions, even if the metric filter evaluates to false, the dict filter will still be invoked, resulting in additional overhead. For example, if we have an expression like 'foo = 5 OR bar = 5', even if the metric filter evaluates both sub-expressions to false, the dict filter will still be called to read the dict pages for evaluation.~~ > >~~That's why I used the residual evaluator in https://github.com/apache/iceberg/pull/6893. It still allows sequential invocation of the three filters, and the short-circuiting logic can still take effect. The drawback is that I had to transform the three filters into the form of a residual evaluator.~~ Update: hmm, just realized that the reading of dict pages is done in a lazy mode, so I believe my conclusion is invalid. @cccs-jc After further recollection, I believe I now remember the exact reason why I initially chose to use the residual filter: The scenario where the combined filter may incur additional overhead is when we have a query like 'foo=5 AND bar=5'. If the metric filter evaluates 'foo=5' as true but 'bar=5' as false, subsequent filters can be skipped. However, in the case of a combined filter, the dict filter would still be invoked to evaluate the expression 'foo=5'. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org