Re: [PR] Spark: Add SparkSQLProperty to control split-size [iceberg]

via GitHub Fri, 25 Jul 2025 11:14:05 -0700


sumedhsakdeo commented on PR #10336:
URL: https://github.com/apache/iceberg/pull/10336#issuecomment-3119862055


   In Hive, we relied on the session-level config 
**spark.sql.files.maxPartitionBytes** to control file split sizes, and this was 
widely adopted across many flows at LinkedIn. As we transition to Iceberg, the 
equivalent setting is **read.split.target-size**, but it can't be set via 
session configs. This creates a gap—our current options are either to update 
thousands of jobs to pass this value through OPTIONS, or to set the table 
property spark.sql.iceberg.split-size directly on the tables and apply the 
config to all the jobs. Both of which is suboptimal. Hence, IMO, it is 
important to have this PR land.
   
   @szehon-ho @amogh-jahagirdar @nastra could you please help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Add SparkSQLProperty to control split-size [iceberg]

Reply via email to