sumedhsakdeo commented on PR #10336: URL: https://github.com/apache/iceberg/pull/10336#issuecomment-3119862055
In Hive, we relied on the session-level config **spark.sql.files.maxPartitionBytes** to control file split sizes, and this was widely adopted across many flows at LinkedIn. As we transition to Iceberg, the equivalent setting is **read.split.target-size**, but it can't be set via session configs. This creates a gap—our current options are either to update thousands of jobs to pass this value through OPTIONS, or to set the table property spark.sql.iceberg.split-size directly on the tables and apply the config to all the jobs. Both of which is suboptimal. Hence, IMO, it is important to have this PR land. @szehon-ho @amogh-jahagirdar @nastra could you please help? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
