bk-mz opened a new issue, #10108:
URL: https://github.com/apache/iceberg/issues/10108

   ### Apache Iceberg version
   
   1.4.2
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Table is defined with `partitioned by (hour(data_load_ts))` where 
data_load_ts is timestamp column.
   
   So, a record with `data_load_ts = timestamp '2024-04-04 01:15:53'` Iceberg 
will put it into `data_load_ts_hour=2024-04-04-01` partition.
   
   Now, imagine I need to upsert a batch to `data_load_ts_hour=2024-04-04-01` 
partition.
   
   I need to write a query of such:
   
   ```
   merge into table as t
   from batch as b
   on batch.id = table.id and system.hour(table.data_load_ts) = 
system.hour(timestamp '2024-04-04 01:00:00')
   when matched then update *
   when not matched then insert *
   ```
   
   It's observed in the physical plan that in this case no pushdown filter is 
applied and iceberg will traverse the whole table. 
   
   ```
   +- BatchScan glue.prod.table[id#1938, data_load_ts#1959, _file#1962, 
_pos#1963L, _spec_id#1960, _partition#1961] glue.prod.table (branch=null) 
[filters=, groupedBy=] RuntimeFilters: []
   ```
   
   Is there any way to somehow ask iceberg not to traverse the whole table? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to