zhongyujiang commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2020699122
>~~If we use ParquetCombinedRowGroupFilter, for certain expressions, even if
the metric filter evaluates to false, the dict filter will still be invoked,
resulting in addition
cccs-jc commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2020343574
@amogh-jahagirdar I'm going to apply this patch to our internal deployment
of Iceberg 1.5 and will likely run with it for a while.
At the same time I will create a PR to the
amogh-jahagirdar commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2018962330
I've been following this thread and after thinking about the proposed
solution and going through the code a bit more, I think @cccs-jc approach is
logically sound. This is
cccs-jc commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2017840320
This weekend I fixed the issue with the three row-group filters not working
together. The results are quite impressive 11 seconds vs 396.
```
-results-4CPU-OR-
zhongyujiang commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2017638219
@cccs-jc @huaxingao I've met the same issue before. Because the three
row-group filters cannot work together, some query expressions containing OR
cannot filter data. I have d
huaxingao commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2016577139
@cccs-jc Thanks for your proposal!
For filter `col1=1 || col2=1`, the current implementation is:
```
shouldRead = statsFilter(col1=1 || col2=1) && dictFilter(col1=1 ||
cccs-jc commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2016525638
@huaxingao You are absolutely correct; the issue arises also when combining
the `statsFilter` with the `dictFilter`. It's essentially the same underlying
problem.
The crux o
huaxingao commented on issue #10029:
URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2016304838
@cccs-jc Thanks a lot for your thorough investigation and analysis!
The problem you described will also occur without a bloom filter. Let's use
the where clause `col1=1 OR
cccs-jc opened a new issue, #10029:
URL: https://github.com/apache/iceberg/issues/10029
### Apache Iceberg version
1.4.3
### Query engine
Spark
### Please describe the bug 🐞
I'm testing a table of flow data with a schema of `SRC_IP long, DST_IP long`