psavalle opened a new issue, #13791: URL: https://github.com/apache/iceberg/issues/13791
### Feature Request / Improvement When a table is unpartitioned, [`write-distribution-mode=hash` is ignored, and is always internally set to `none`](https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java#L304). However, it could be useful to always do a hash rebalance before writing even when the table is not partitioned so that Spark's AQE can coalesce partitions based on `spark.sql.adaptive.advisoryPartitionSizeInBytes`. This can help avoid writing small data files. [The docs](https://iceberg.apache.org/docs/1.9.1/spark-writes/#controlling-file-sizes) mention that `write.target-file-size-bytes` should eventually be taken into account for this instead, but in the meantime, forcing a hash rebalance would be useful. ### Query engine Spark ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [ ] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
