atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1766286164
@amogh-jahagirdar I think I know how these delete files are generated even
though copy on write is defined at table level. I have executed the delete from
Trino and since it only supports
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764912976
@huaxingao So finally it is working but without `between `and `<=`
operators. Yes, I have to tweak my query to adjust the timezone so that entire
partition is picked by query.
```
huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764861269
@atifiu Based on the log, only `IsNotNull(initial_page_view_dtm)` is
completely evaluated on iceberg side. Both `(initial_page_view_dtm#3 >=
2023-06-02 06:00:00)` and `initial_page_view
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764246045
@huaxingao Based on your suggestion, I have narrowed the filter criteria so
that even considering the timezone problem, we dont filter on more than two
partitions so that filter can be pus
huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1763425359
@atifiu That's right. If the filters are not completely evaluated on
iceberg, then Spark has to evaluate the filters, the Min/Max/Count values might
change after the filtering. Therefor
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1762436892
@huaxingao So, unless we get this message "Evaluating completely on
Iceberg" which means full filter pushdown is down, otherwise filter pushdown is
partial or or not at all. In my case I
huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1762149585
@atifiu I suspect somehow your partition filter isn't completely pushed
down. In this [PR](https://github.com/apache/iceberg/pull/6524), we will
discard filters that can be completely e
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761916077
It's not working. Either with between, > or <.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761909655
If filters are on partitioned columns, aggregate pushdown should work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub a
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761688659
@huaxingao so you meant to say that with filters whether on partitioned or
non partitioned column(s), aggregate pushdown will not work ?
--
This is an automated message from the Apache G
huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761680652
@atifiu File statistics are not accurate and can't be used any more if you
use filters.
For example, you have table (col int), the max of col is 100, and the min is
0, so the sta
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1760825445
@huaxingao What can be the possible reasons for aggregate pushdown to not
work when using filters, if you can give me some idea/hint I will try to look
into it further.
--
This is an au
amogh-jahagirdar commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1759179813
My mistake, yes you can have format version 2 and have copy on write. The
remaining issue is why you are even seeing delete files if CoW is set. That
seems to be the fundamental
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758873680
@amogh-jahagirdar format version defined is 2 and I have explicitly defined
copy on write for delete, update and merge. I have deleted some partitions and
have noticed that in the snapshot
amogh-jahagirdar commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758034473
Huh that is strange if you are hitting this log line and have CoW defined, I
don't see how that can be possible. @atifiu do you mind creating a new issue
and include these detail
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758010815
I am pretty sure that I don't have any delete files because I have defined
copy on write for update, merge, delete.
--
This is an automated message from the Apache Git Service.
To respon
RussellSpitzer commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757995661
You would need to remove all delete files from the snapshot. I think this
currently requires a rewrite data files + rewrite delete files
--
This is an automated message from the
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757993486
@RussellSpitzer What can be done to resolve it ? Is rewriting the data file
will resolve it ?
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
RussellSpitzer commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757857480
@atifiu Pushdown cannot happen if there are row level deletes as indicated
in that log line. Row level deletes mean the file statistics are not accurate
so they cannot be used for
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757848584
@huaxingao Thanks for your response. Even in the case of max on non filter
column, aggregate pushdown is not working.
In the below explain plan partition is defined on initial_page_v
huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1756724970
If you have filters on the aggregated columns, e.g. SELECT MAX(col) FROM
table WHERE col > 1 && col < 10, then push down is not supported.
On Tue, Oct 10, 2023 at 9:52 AM Atif *
atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1755857764
@huaxingao I was executing max/count query on iceberg table version 1.3.0
and Spark3.3.1 but unable to see aggregate pushdown i.e. LocalTableScan
Cc: @RussellSpitzer
`spark.s
22 matches
Mail list logo