[I] Support write-distribution-mode=hash on unpartitioned tables [iceberg]

via GitHub Tue, 12 Aug 2025 07:12:59 -0700


psavalle opened a new issue, #13791:
URL: https://github.com/apache/iceberg/issues/13791


   ### Feature Request / Improvement
   
   When a table is unpartitioned, [`write-distribution-mode=hash` is ignored, 
and is always internally set to 
`none`](https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java#L304).
   
   However, it could be useful to always do a hash rebalance before writing 
even when the table is not partitioned so that Spark's AQE can coalesce 
partitions based on `spark.sql.adaptive.advisoryPartitionSizeInBytes`. This can 
help avoid writing small data files.
   
   [The 
docs](https://iceberg.apache.org/docs/1.9.1/spark-writes/#controlling-file-sizes)
 mention that `write.target-file-size-bytes` should eventually be taken into 
account for this instead, but in the meantime, forcing a hash rebalance would 
be useful.
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Support write-distribution-mode=hash on unpartitioned tables [iceberg]

Reply via email to