jkolash commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2977939838
I believe it is happening because normally these would be separate tasks but coalesce kind of hides each task and combines multiple partitions into 1 partition so the task cannot "complete" and the callbacks are held much longer. Also I ran with the parquet v2 code https://github.com/apache/iceberg/issues/13297#issuecomment-2968557949 and a similar fix needs to be applied here I believe. https://github.com/apache/spark/blob/59e6b5b7d350a1603502bc92e3c117311ab2cbb6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala#L312 <img width="1303" alt="Image" src="https://github.com/user-attachments/assets/c8375235-63a5-473b-97d9-50ae4654eed0" /> > is it for all wide iceberg tables, and coalesce just makes it more vulnerable? This particular table is ~ 500 columns wide and with nesting. I can produce a synthetic dataset later or as part of this issue so it can be reproduced by anyone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org