Re: [PR] Docs: Document adaptive split sizing configurations [iceberg]

via GitHub Tue, 26 May 2026 14:35:37 -0700


pratham76 commented on code in PR #16557:
URL: https://github.com/apache/iceberg/pull/16557#discussion_r3306998080



##########
docs/docs/spark-configuration.md:
##########
@@ -207,6 +207,8 @@ val spark = SparkSession.builder()
 | spark.sql.iceberg.executor-cache.locality.enabled      | false               
                                           | Enables locality-aware executor 
cache usage                                                                     
                |
 | spark.sql.iceberg.merge-schema                         | false               
                                           | Enables modifying the table schema 
to match the write schema. Only adds columns missing columns                    
             |
 | spark.sql.iceberg.report-column-stats                  | true                
                                           | Report Puffin Table Statistics if 
available to Spark's Cost Based Optimizer. CBO must be enabled for this to be 
effective       |
+| spark.sql.iceberg.read.adaptive-split-size.enabled     | Table default       
                                           | Enables adaptive split sizing for 
read operations. When enabled, split size is automatically adjusted based on 
scan size and parallelism |
+| spark.sql.iceberg.read.adaptive-split-size.parallelism | 
max(spark.default.parallelism, spark.sql.shuffle.partitions)  | Overrides the 
parallelism used for adaptive split sizing. Must be greater than 0              
                                  |

Review Comment:
   On an other thought, the default value does not exactly correspond to 
spark's default parallelism in this case, as it is maximum of both values 
`spark.default.parallelism` and `spark.sql.shuffle.partitions`, thought of 
documenting it explicitly. Please do provide your thoughts on this. Thanks!



##########
docs/docs/spark-configuration.md:
##########
@@ -207,6 +207,8 @@ val spark = SparkSession.builder()
 | spark.sql.iceberg.executor-cache.locality.enabled      | false               
                                           | Enables locality-aware executor 
cache usage                                                                     
                |
 | spark.sql.iceberg.merge-schema                         | false               
                                           | Enables modifying the table schema 
to match the write schema. Only adds columns missing columns                    
             |
 | spark.sql.iceberg.report-column-stats                  | true                
                                           | Report Puffin Table Statistics if 
available to Spark's Cost Based Optimizer. CBO must be enabled for this to be 
effective       |
+| spark.sql.iceberg.read.adaptive-split-size.enabled     | Table default       
                                           | Enables adaptive split sizing for 
read operations. When enabled, split size is automatically adjusted based on 
scan size and parallelism |
+| spark.sql.iceberg.read.adaptive-split-size.parallelism | 
max(spark.default.parallelism, spark.sql.shuffle.partitions)  | Overrides the 
parallelism used for adaptive split sizing. Must be greater than 0              
                                  |

Review Comment:
   On another thought, the default value does not exactly correspond to spark's 
default parallelism in this case, as it is maximum of both values 
`spark.default.parallelism` and `spark.sql.shuffle.partitions`, thought of 
documenting it explicitly. Please do provide your thoughts on this. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Docs: Document adaptive split sizing configurations [iceberg]

Reply via email to