Re: [I] Bloom filter not properly leveraged when using an OR condition [iceberg]

via GitHub Tue, 26 Mar 2024 08:11:00 -0700


zhongyujiang commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2020699122


   >~~If we use ParquetCombinedRowGroupFilter, for certain expressions, even if 
the metric filter evaluates to false, the dict filter will still be invoked, 
resulting in additional overhead. For example, if we have an expression like 
'foo = 5 OR bar = 5', even if the metric filter evaluates both sub-expressions 
to false, the dict filter will still be called to read the dict pages for 
evaluation.~~
   >
   >~~That's why I used the residual evaluator in 
https://github.com/apache/iceberg/pull/6893. It still allows sequential 
invocation of the three filters, and the short-circuiting logic can still take 
effect. The drawback is that I had to transform the three filters into the form 
of a residual evaluator.~~
   Update: hmm, just realized that the reading of dict pages is done in a lazy 
mode, so I believe my conclusion is invalid.
   
   @cccs-jc  After further recollection, I believe I now remember the exact 
reason why I initially chose to use the residual filter:
   The scenario where the combined filter may incur additional overhead is when 
we have a query like 'foo=5 AND bar=5'. If the metric filter evaluates 'foo=5' 
as true but 'bar=5' as false, subsequent filters can be skipped. However, in 
the case of a combined filter, the dict filter would still be invoked to 
evaluate the expression 'foo=5'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Bloom filter not properly leveraged when using an OR condition [iceberg]

Reply via email to