bhupixb opened a new issue, #9193: URL: https://github.com/apache/iceberg/issues/9193
### Apache Iceberg version 1.4.1 ### Query engine Flink ### Please describe the bug 🐞 ##### Background: We are using the flink iceberg sinks to write data to an iceberg table(using hive catalog, storage S3) in _**table-format v2, upsert enabled**_. We have written ~25 million records on a table (daily partitioned) and the Table sort order is on 3 fields(all string, having max 32 char length). Table Schema: ```java public class SideOutput { private String input_record; private String operator; private String error_message; private String error_code; private String job_id; private String tenant_id; private Integer unique_id; private String status; private Long processed_at; private String job_creation_date; private String record_id; private String parent_operator; } ``` We are in the POC phase only, and the data is written successfully(only 25m records in 1 day partition, no other data). We have checkpoint of 60s, and around ~350 files are written in the S3. Each file size range from ~10kb to ~3mb. #### Issue: When we are trying to read data from this table using trino, our trino workers occupy whole memory(4 workers, each 20gb) and Major GC trigger frequently and eventually they die. Earlier, I thought we might have a problem with trino infra. So I wrote a flink job to read this data simple query like (select a, b, c, count(1) from table group by a,b,c; This query will have max 5-6 rows). And surprisingly, we face the same **issue of heap OOM** in TM. Then I came across this doc, https://iceberg.apache.org/docs/latest/flink-actions/ to rewrite small files into larger one's. So I wrote another flink job with 4 parallelism, TM (memory 16 GB, 8 core CPU) to perform this re-write action. This job runs for some time and eventually, the TM dies. I can see TM CPU going 100%, heap also ~95%. We took a heap dump of worker and see the memory is occupied by HashMap: <img width="906" alt="image" src="https://github.com/apache/iceberg/assets/30554307/9394a4f8-5b3e-470d-8f88-b88e5f618c6e"> I went through a bunch of github issues like https://github.com/apache/iceberg/issues/6104 but not really able to figure out what is causing the issue. Flink Version: 1.16.0 Iceberg version: 1.4.1 We tried both batch and streaming mode for RewriteActions job, but no luck. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org