bluzy opened a new issue, #9329: URL: https://github.com/apache/iceberg/issues/9329
### Apache Iceberg version 1.3.1 ### Query engine Hive ### Please describe the bug π I have a question when querying a partitioned table in Hive. I have hourly partitioned table with Timestamp column. When I query to the table, I am getting OOM error. ```hql SELECT count(1) FROM pdd__db_pbp.iceberg__raw_place_business_detail_v2 WHERE pdp.partition_timestamp BETWEEN "2023-12-13 14:00:00" AND "2023-12-13 14:10:00"; ``` ``` java.lang.OutOfMemoryError: Java heap space ββat com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:907) ββat com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:902) ββat com.google.protobuf.ByteString.newCodedBuilder(ByteString.java:898) ββat com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49) ββat org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:357) ββat org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:316) ββat org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281) ββat org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272) ββat java.security.AccessController.doPrivileged(Native Method) ββat javax.security.auth.Subject.doAs(Subject.java:422) ββat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ββat org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272) ββat org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256) ββat com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) ββat com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) ββat com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) ββat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ββat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ββat java.lang.Thread.run(Thread.java:745) ``` The range is small, size of data files in the range is about just 200mb. So I am suspecting that partition pruning is not working, and full scan is occurring. The hive error log seems to be related: ``` 2023-12-15 12:00:04,193 [WARN] [InputInitializer {Map 1} #0] |hive.HiveIcebergInputFormat|: Unable to create Iceberg filter, continuing without filter (will be applied by Hive later): java.lang.UnsupportedOperationException: CONSTANT operator is not supported at org.apache.iceberg.mr.hive.HiveIcebergFilterFactory.translate(HiveIcebergFilterFactory.java:87) at org.apache.iceberg.mr.hive.HiveIcebergFilterFactory.generateFilterExpression(HiveIcebergFilterFactory.java:53) at org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:90) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` And query plan ``` +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | Plan optimized by CBO. | | | | Vertex dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | | | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 | | File Output Operator [FS_7] | | Group By Operator [GBY_5] (rows=1 width=1656) | | Output:["_col0"],aggregations:["count(VALUE._col0)"] | | <-Map 1 [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE [RS_4] | | Group By Operator [GBY_3] (rows=1 width=1656) | | Output:["_col0"],aggregations:["count()"] | | Select Operator [SEL_2] (rows=932948 width=1565) | | Filter Operator [FIL_8] (rows=932948 width=1565) | | predicate:pdp.partition_timestamp BETWEEN TIMESTAMPLOCALTZ'2023-12-13 14:00:00.0 Asia/Seoul' AND TIMESTAMPLOCALTZ'2023-12-13 14:10:00.0 Asia/Seoul' | | TableScan [TS_0] (rows=8396537 width=1565) | | pdd__db_pbp@iceberg__raw_place_business_detail_v2,iceberg__raw_place_business_detail_v2,Tbl:COMPLETE,Col:NONE,Output:["pdp"] | | | +----------------------------------------------------+ ``` I'm using these versions. Hadoop 3.1.2 Hive 3.1.0 Tez 0.9.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org