Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-17 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1766286164 @amogh-jahagirdar I think I know how these delete files are generated even though copy on write is defined at table level. I have executed the delete from Trino and since it only supports

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-16 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764912976 @huaxingao So finally it is working but without `between `and `<=` operators. Yes, I have to tweak my query to adjust the timezone so that entire partition is picked by query. ```

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-16 Thread via GitHub
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764861269 @atifiu Based on the log, only `IsNotNull(initial_page_view_dtm)` is completely evaluated on iceberg side. Both `(initial_page_view_dtm#3 >= 2023-06-02 06:00:00)` and `initial_page_view

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-16 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764246045 @huaxingao Based on your suggestion, I have narrowed the filter criteria so that even considering the timezone problem, we dont filter on more than two partitions so that filter can be pus

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-15 Thread via GitHub
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1763425359 @atifiu That's right. If the filters are not completely evaluated on iceberg, then Spark has to evaluate the filters, the Min/Max/Count values might change after the filtering. Therefor

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1762436892 @huaxingao So, unless we get this message "Evaluating completely on Iceberg" which means full filter pushdown is down, otherwise filter pushdown is partial or or not at all. In my case I

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1762149585 @atifiu I suspect somehow your partition filter isn't completely pushed down. In this [PR](https://github.com/apache/iceberg/pull/6524), we will discard filters that can be completely e

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761916077 It's not working. Either with between, > or <. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761909655 If filters are on partitioned columns, aggregate pushdown should work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761688659 @huaxingao so you meant to say that with filters whether on partitioned or non partitioned column(s), aggregate pushdown will not work ? -- This is an automated message from the Apache G

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761680652 @atifiu File statistics are not accurate and can't be used any more if you use filters. For example, you have table (col int), the max of col is 100, and the min is 0, so the sta

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-12 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1760825445 @huaxingao What can be the possible reasons for aggregate pushdown to not work when using filters, if you can give me some idea/hint I will try to look into it further. -- This is an au

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-12 Thread via GitHub
amogh-jahagirdar commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1759179813 My mistake, yes you can have format version 2 and have copy on write. The remaining issue is why you are even seeing delete files if CoW is set. That seems to be the fundamental

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758873680 @amogh-jahagirdar format version defined is 2 and I have explicitly defined copy on write for delete, update and merge. I have deleted some partitions and have noticed that in the snapshot

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758034473 Huh that is strange if you are hitting this log line and have CoW defined, I don't see how that can be possible. @atifiu do you mind creating a new issue and include these detail

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758010815 I am pretty sure that I don't have any delete files because I have defined copy on write for update, merge, delete. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
RussellSpitzer commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757995661 You would need to remove all delete files from the snapshot. I think this currently requires a rewrite data files + rewrite delete files -- This is an automated message from the

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757993486 @RussellSpitzer What can be done to resolve it ? Is rewriting the data file will resolve it ? -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
RussellSpitzer commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757857480 @atifiu Pushdown cannot happen if there are row level deletes as indicated in that log line. Row level deletes mean the file statistics are not accurate so they cannot be used for

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757848584 @huaxingao Thanks for your response. Even in the case of max on non filter column, aggregate pushdown is not working. In the below explain plan partition is defined on initial_page_v

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-10 Thread via GitHub
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1756724970 If you have filters on the aggregated columns, e.g. SELECT MAX(col) FROM table WHERE col > 1 && col < 10, then push down is not supported. On Tue, Oct 10, 2023 at 9:52 AM Atif *

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-10 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1755857764 @huaxingao I was executing max/count query on iceberg table version 1.3.0 and Spark3.3.1 but unable to see aggregate pushdown i.e. LocalTableScan Cc: @RussellSpitzer `spark.s