[PR] SparkSQLProperty for split-size [iceberg]

via GitHub Fri, 25 Jul 2025 11:24:50 -0700


sumedhsakdeo opened a new pull request, #13677:
URL: https://github.com/apache/iceberg/pull/13677


   In Hive, we relied on the session-level config 
spark.sql.files.maxPartitionBytes to control file split sizes, and this was 
widely adopted across many flows at LinkedIn. As we transition to Iceberg, the 
equivalent setting is read.split.target-size, but it can't be set via session 
configs. This creates a gap—our current options are either to update thousands 
of jobs to pass this value through OPTIONS, or to set the table property 
spark.sql.iceberg.split-size directly on the tables and apply the config to all 
the jobs. Both of which is suboptimal. Hence, IMO, it is important to have this 
PR land.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] SparkSQLProperty for split-size [iceberg]

Reply via email to