0xffmeta opened a new issue, #6784: URL: https://github.com/apache/iceberg/issues/6784
### Apache Iceberg version 0.13.1 ### Query engine Hive ### Please describe the bug 🐞 After we upgrade from iceberg v1 format to v2 with a flink upsert job, we constanly met ```java.lang.OutOfMemoryError: GC overhead limit exceeded``` issue with tez in hive. The full stack is > 2023-02-09 05:46:01,001 [ERROR] [TezChild] |tez.TezProcessor|: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.iceberg.types.Comparators$NullsFirst.thenComparing(Comparators.java:214) at org.apache.iceberg.types.Comparators$StructLikeComparator.lambda$new$0(Comparators.java:109) at org.apache.iceberg.types.Comparators$StructLikeComparator$$Lambda$112/223411728.apply(Unknown Source) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:546) at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:505) at org.apache.iceberg.types.Comparators$StructLikeComparator.<init>(Comparators.java:112) at org.apache.iceberg.types.Comparators$StructLikeComparator.<init>(Comparators.java:102) at org.apache.iceberg.types.Comparators.forType(Comparators.java:53) at org.apache.iceberg.util.StructLikeWrapper.<init>(StructLikeWrapper.java:43) at org.apache.iceberg.util.StructLikeWrapper.forType(StructLikeWrapper.java:34) at org.apache.iceberg.util.StructLikeSet.add(StructLikeSet.java:103) at org.apache.iceberg.util.StructLikeSet.add(StructLikeSet.java:33) at org.apache.iceberg.relocated.com.google.common.collect.Iterators.addAll(Iterators.java:356) at org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:320) at org.apache.iceberg.deletes.Deletes.toEqualitySet(Deletes.java:79) at org.apache.iceberg.data.DeleteFilter.applyEqDeletes(DeleteFilter.java:156) at org.apache.iceberg.data.DeleteFilter.applyEqDeletes(DeleteFilter.java:185) at org.apache.iceberg.data.DeleteFilter.filter(DeleteFilter.java:126) at org.apache.iceberg.mr.mapreduce.IcebergInputFormat$IcebergRecordReader.open(IcebergInputFormat.java:312) at org.apache.iceberg.mr.mapreduce.IcebergInputFormat$IcebergRecordReader.initialize(IcebergInputFormat.java:231) at org.apache.iceberg.mr.mapred.AbstractMapredIcebergRecordReader.<init>(AbstractMapredIcebergRecordReader.java:40) at org.apache.iceberg.mr.mapred.MapredIcebergInputFormat$MapredIcebergRecordReader.<init>(MapredIcebergInputFormat.java:89) at org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getRecordReader(MapredIcebergInputFormat.java:79) at org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getRecordReader(HiveIcebergInputFormat.java:120) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) This happened even we set the container to a super big one, like > SET tez.am.resource.memory.mb=10096; SET tez.container.size=30000; SET hive.tez.container.size=30000; SET hive.tez.java.opts=-Xmx29000m; We have tried to use `rewirte` action to reduce the EqualityDeleteFiles, but somehow we still met the problem(not that frequent). My understanding is that the `rewrite` action will apply all delete files to data files and leave only data files, but not sure if this is related with the bug in hive on tez? Or we have othere actions to optimize the query performance for iceberg v2 format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org