wombatu-kun commented on PR #16154: URL: https://github.com/apache/iceberg/pull/16154#issuecomment-4628042882
Context for reviewers - this capability has prior history worth reconciling before a deep review. The same feature was proposed twice before and closed without merging (#10336 and its revival #13677). On #10336, @szehon-ho objected to a bare global `spark.sql.iceberg.split-size` key on design grounds: "The problem with this is that it does lead to some ambiguity as to what table the config is applying to (many queries read from several tables, for example)" (https://github.com/apache/iceberg/pull/10336#issuecomment-2182038718). He pointed at per-scan SQL OPTIONS on the Spark side as the alternative (https://github.com/apache/spark/pull/46707) and later noted it had become possible there (https://github.com/apache/iceberg/pull/10336#issuecomment-2219566750). This PR's table-scoped key (`spark.sql.iceberg.split-size.<table-name>`) directly answers that multi-table ambiguity. The global tier (`spark.sql.iceberg.split-size`) it also adds reintroduces the same ambiguity for queries reading several tables, so a maintainer call on whether the global tier should stay would help before deeper review. There is also a competing open PR #16185 proposing the same capability with different naming (`spark.sql.iceberg.split.target-size`, plus lookback / open-file-cost / adaptive knobs, mirroring the table-property names). It is currently stale. Worth deciding which naming and scope to converge on before either is reviewed in depth. Two related notes for whoever picks this up: - whether per-scan SQL OPTIONS (now available in Spark) already cover the original use case is worth re-confirming. - the resolution change here is generic - it adds the `.<table-name>` table-scoped suffix to every `spark.sql.iceberg.*` session conf, not only split-size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
