johnnywalker opened a new issue, #10051: URL: https://github.com/apache/iceberg/issues/10051
### Feature Request / Improvement While working with Iceberg on a local Spark cluster, I repeatedly encountered heap size errors when using CTAS/RTAS with `PARTITIONED BY`. These errors baffled me for a bit and I eventually understood that Adaptive Query Execution (AQE) was drastically reducing the partition count, exhausting the memory pool. I poured through documentation, but I finally found the root cause after digging into source code: Iceberg calculates and supplies an advisory partition size to Spark, and Spark prefers this value over the configured default. [Current documentation](https://iceberg.apache.org/docs/1.5.0/spark-writes/#controlling-file-sizes) explains how Spark AQE will coalesce and split partitions according to the advisory partition size, configured by `spark.sql.adaptive.advisoryPartitionSizeInBytes`. However, documentation does not mention [Iceberg's advisory partition size configuration](https://github.com/apache/iceberg/blob/81b62c78e0c230516090becda7d6040ee03e6a91/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java#L688), and it does not mention that Iceberg [overrides the Spark configuration](https://github.com/apache/spark/blob/8bcbf7701388a2da06369ae9317d7707624edba0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala#L129). By default, Iceberg increased the advisory partition size from 64MB to 384MB, which exhausted the 4GB executor heap in my local cluster. As an example, I've added the following to my `spark-defaults.conf` to set the session configuration: ``` # reduce iceberg's default advisory partition size (384m) to prevent heap exhaustion spark.sql.iceberg.advisory-partition-size=67108864 ``` Alternatively, I've tried setting the table property with success: ```sql CREATE TABLE db.table USING iceberg PARTITIONED BY (days(trandate)) TBLPROPERTIES ('write.spark.advisory-partition-size-bytes'='33554432') AS SELECT * FROM landing.table; ``` ### Query engine Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org