bluzy opened a new issue, #9329:
URL: https://github.com/apache/iceberg/issues/9329

   ### Apache Iceberg version
   
   1.3.1
   
   ### Query engine
   
   Hive
   
   ### Please describe the bug 🐞
   
   I have a question when querying a partitioned table in Hive.
   
   I have hourly partitioned table with Timestamp column. When I query to the 
table, I am getting OOM error.
   
   ```hql
   SELECT count(1) FROM pdd__db_pbp.iceberg__raw_place_business_detail_v2
   WHERE pdp.partition_timestamp BETWEEN "2023-12-13 14:00:00" AND "2023-12-13 
14:10:00";
   ```
   
   ```
   java.lang.OutOfMemoryError: Java heap space
     at com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:907)
     at com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:902)
     at com.google.protobuf.ByteString.newCodedBuilder(ByteString.java:898)
     at 
com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
     at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:357)
     at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:316)
     at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281)
     at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:422)
     at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
     at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272)
     at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256)
     at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
     at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
     at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     at java.lang.Thread.run(Thread.java:745)
   ```
   
   The range is small, size of data files in the range is about just 200mb. So 
I am suspecting that partition pruning is not working, and full scan is 
occurring.
   
   The hive error log seems to be related:
   
   ```
   2023-12-15 12:00:04,193 [WARN] [InputInitializer {Map 1} #0] 
|hive.HiveIcebergInputFormat|: Unable to create Iceberg filter, continuing 
without filter (will be applied by Hive later): 
   java.lang.UnsupportedOperationException: CONSTANT operator is not supported
        at 
org.apache.iceberg.mr.hive.HiveIcebergFilterFactory.translate(HiveIcebergFilterFactory.java:87)
        at 
org.apache.iceberg.mr.hive.HiveIcebergFilterFactory.generateFilterExpression(HiveIcebergFilterFactory.java:53)
        at 
org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:90)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   ```
   
   And query plan
   
   ```
   +----------------------------------------------------+
   |                      Explain                       |
   +----------------------------------------------------+
   | Plan optimized by CBO.                             |
   |                                                    |
   | Vertex dependency in root stage                    |
   | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)            |
   |                                                    |
   | Stage-0                                            |
   |   Fetch Operator                                   |
   |     limit:-1                                       |
   |     Stage-1                                        |
   |       Reducer 2                                    |
   |       File Output Operator [FS_7]                  |
   |         Group By Operator [GBY_5] (rows=1 width=1656) |
   |           Output:["_col0"],aggregations:["count(VALUE._col0)"] |
   |         <-Map 1 [CUSTOM_SIMPLE_EDGE]               |
   |           PARTITION_ONLY_SHUFFLE [RS_4]            |
   |             Group By Operator [GBY_3] (rows=1 width=1656) |
   |               Output:["_col0"],aggregations:["count()"] |
   |               Select Operator [SEL_2] (rows=932948 width=1565) |
   |                 Filter Operator [FIL_8] (rows=932948 width=1565) |
   |                   predicate:pdp.partition_timestamp BETWEEN 
TIMESTAMPLOCALTZ'2023-12-13 14:00:00.0 Asia/Seoul' AND 
TIMESTAMPLOCALTZ'2023-12-13 14:10:00.0 Asia/Seoul' |
   |                   TableScan [TS_0] (rows=8396537 width=1565) |
   |                     
pdd__db_pbp@iceberg__raw_place_business_detail_v2,iceberg__raw_place_business_detail_v2,Tbl:COMPLETE,Col:NONE,Output:["pdp"]
 |
   |                                                    |
   +----------------------------------------------------+
   ```
   
   
   I'm using these versions.
   
   Hadoop 3.1.2
   Hive 3.1.0
   Tez 0.9.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to