bk-mz opened a new issue, #10108: URL: https://github.com/apache/iceberg/issues/10108
### Apache Iceberg version 1.4.2 ### Query engine Spark ### Please describe the bug 🐞 Table is defined with `partitioned by (hour(data_load_ts))` where data_load_ts is timestamp column. So, a record with `data_load_ts = timestamp '2024-04-04 01:15:53'` Iceberg will put it into `data_load_ts_hour=2024-04-04-01` partition. Now, imagine I need to upsert a batch to `data_load_ts_hour=2024-04-04-01` partition. I need to write a query of such: ``` merge into table as t from batch as b on batch.id = table.id and system.hour(table.data_load_ts) = system.hour(timestamp '2024-04-04 01:00:00') when matched then update * when not matched then insert * ``` It's observed in the physical plan that in this case no pushdown filter is applied and iceberg will traverse the whole table. ``` +- BatchScan glue.prod.table[id#1938, data_load_ts#1959, _file#1962, _pos#1963L, _spec_id#1960, _partition#1961] glue.prod.table (branch=null) [filters=, groupedBy=] RuntimeFilters: [] ``` Is there any way to somehow ask iceberg not to traverse the whole table? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org