0xffmeta opened a new issue, #6784:
URL: https://github.com/apache/iceberg/issues/6784

   ### Apache Iceberg version
   
   0.13.1
   
   ### Query engine
   
   Hive
   
   ### Please describe the bug 🐞
   
   After we upgrade from iceberg v1 format to v2 with a flink upsert job, we 
constanly met ```java.lang.OutOfMemoryError: GC overhead limit exceeded``` 
issue with tez in hive. 
   The full stack is 
   
   > 2023-02-09 05:46:01,001 [ERROR] [TezChild] |tez.TezProcessor|: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at 
org.apache.iceberg.types.Comparators$NullsFirst.thenComparing(Comparators.java:214)
        at 
org.apache.iceberg.types.Comparators$StructLikeComparator.lambda$new$0(Comparators.java:109)
        at 
org.apache.iceberg.types.Comparators$StructLikeComparator$$Lambda$112/223411728.apply(Unknown
 Source)
        at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:546)
        at 
java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
        at 
java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:505)
        at 
org.apache.iceberg.types.Comparators$StructLikeComparator.<init>(Comparators.java:112)
        at 
org.apache.iceberg.types.Comparators$StructLikeComparator.<init>(Comparators.java:102)
        at org.apache.iceberg.types.Comparators.forType(Comparators.java:53)
        at 
org.apache.iceberg.util.StructLikeWrapper.<init>(StructLikeWrapper.java:43)
        at 
org.apache.iceberg.util.StructLikeWrapper.forType(StructLikeWrapper.java:34)
        at org.apache.iceberg.util.StructLikeSet.add(StructLikeSet.java:103)
        at org.apache.iceberg.util.StructLikeSet.add(StructLikeSet.java:33)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Iterators.addAll(Iterators.java:356)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:320)
        at org.apache.iceberg.deletes.Deletes.toEqualitySet(Deletes.java:79)
        at 
org.apache.iceberg.data.DeleteFilter.applyEqDeletes(DeleteFilter.java:156)
        at 
org.apache.iceberg.data.DeleteFilter.applyEqDeletes(DeleteFilter.java:185)
        at org.apache.iceberg.data.DeleteFilter.filter(DeleteFilter.java:126)
        at 
org.apache.iceberg.mr.mapreduce.IcebergInputFormat$IcebergRecordReader.open(IcebergInputFormat.java:312)
        at 
org.apache.iceberg.mr.mapreduce.IcebergInputFormat$IcebergRecordReader.initialize(IcebergInputFormat.java:231)
        at 
org.apache.iceberg.mr.mapred.AbstractMapredIcebergRecordReader.<init>(AbstractMapredIcebergRecordReader.java:40)
        at 
org.apache.iceberg.mr.mapred.MapredIcebergInputFormat$MapredIcebergRecordReader.<init>(MapredIcebergInputFormat.java:89)
        at 
org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getRecordReader(MapredIcebergInputFormat.java:79)
        at 
org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getRecordReader(HiveIcebergInputFormat.java:120)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376)
        at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
        at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
   
   This happened even we set the container to a super big one, like
   
   > SET tez.am.resource.memory.mb=10096;
   SET tez.container.size=30000;
   SET hive.tez.container.size=30000;
   SET hive.tez.java.opts=-Xmx29000m;
   
   We have tried to use `rewirte` action to reduce the EqualityDeleteFiles, but 
somehow we still met the problem(not that frequent). My understanding is that 
the `rewrite` action will apply all delete files to data files and leave only 
data files, but not sure if this is related with the bug in hive on tez? Or we 
have othere actions to optimize the query performance for iceberg v2 format. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to