pratham76 commented on code in PR #16557:
URL: https://github.com/apache/iceberg/pull/16557#discussion_r3306998080
##########
docs/docs/spark-configuration.md:
##########
@@ -207,6 +207,8 @@ val spark = SparkSession.builder()
| spark.sql.iceberg.executor-cache.locality.enabled | false
| Enables locality-aware executor
cache usage
|
| spark.sql.iceberg.merge-schema | false
| Enables modifying the table schema
to match the write schema. Only adds columns missing columns
|
| spark.sql.iceberg.report-column-stats | true
| Report Puffin Table Statistics if
available to Spark's Cost Based Optimizer. CBO must be enabled for this to be
effective |
+| spark.sql.iceberg.read.adaptive-split-size.enabled | Table default
| Enables adaptive split sizing for
read operations. When enabled, split size is automatically adjusted based on
scan size and parallelism |
+| spark.sql.iceberg.read.adaptive-split-size.parallelism |
max(spark.default.parallelism, spark.sql.shuffle.partitions) | Overrides the
parallelism used for adaptive split sizing. Must be greater than 0
|
Review Comment:
On an other thought, the default value does not exactly correspond to
spark's default parallelism in this case, as it is maximum of both values
`spark.default.parallelism` and `spark.sql.shuffle.partitions`, thought of
documenting it explicitly. Please do provide your thoughts on this. Thanks!
##########
docs/docs/spark-configuration.md:
##########
@@ -207,6 +207,8 @@ val spark = SparkSession.builder()
| spark.sql.iceberg.executor-cache.locality.enabled | false
| Enables locality-aware executor
cache usage
|
| spark.sql.iceberg.merge-schema | false
| Enables modifying the table schema
to match the write schema. Only adds columns missing columns
|
| spark.sql.iceberg.report-column-stats | true
| Report Puffin Table Statistics if
available to Spark's Cost Based Optimizer. CBO must be enabled for this to be
effective |
+| spark.sql.iceberg.read.adaptive-split-size.enabled | Table default
| Enables adaptive split sizing for
read operations. When enabled, split size is automatically adjusted based on
scan size and parallelism |
+| spark.sql.iceberg.read.adaptive-split-size.parallelism |
max(spark.default.parallelism, spark.sql.shuffle.partitions) | Overrides the
parallelism used for adaptive split sizing. Must be greater than 0
|
Review Comment:
On another thought, the default value does not exactly correspond to spark's
default parallelism in this case, as it is maximum of both values
`spark.default.parallelism` and `spark.sql.shuffle.partitions`, thought of
documenting it explicitly. Please do provide your thoughts on this. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]