sumedhsakdeo opened a new pull request, #13677: URL: https://github.com/apache/iceberg/pull/13677
In Hive, we relied on the session-level config spark.sql.files.maxPartitionBytes to control file split sizes, and this was widely adopted across many flows at LinkedIn. As we transition to Iceberg, the equivalent setting is read.split.target-size, but it can't be set via session configs. This creates a gap—our current options are either to update thousands of jobs to pass this value through OPTIONS, or to set the table property spark.sql.iceberg.split-size directly on the tables and apply the config to all the jobs. Both of which is suboptimal. Hence, IMO, it is important to have this PR land. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
