guykhazma commented on code in PR #12931: URL: https://github.com/apache/iceberg/pull/12931#discussion_r2067431470
########## docs/docs/spark-configuration.md: ########## @@ -145,6 +145,61 @@ Using those SQL commands requires adding Iceberg extensions to your Spark enviro ## Runtime configuration +### Precedence of Configuration Settings +Iceberg allows configurations to be specified at different levels. The effective configuration for a read or write operation is determined based on the following order of precedence: + +1. DataSource API Read/Write Options – Explicitly passed to `.option(...)` in a read/write operation. + +2. Spark Session Configuration - Set globally in Spark via `spark.conf.set(...)`, `spark-defaults.conf`, or `--conf` in spark-submit. + +3. Table Properties – Defined on the Iceberg table via `ALTER TABLE SET TBLPROPERTIES`. + +4. Default Value + +If a setting is not defined at a higher level, the next level is used as fallback. This allows flexibility while enabling global defaults when needed. + +### Spark SQL Options + +Iceberg supports setting various global behaviors using Spark SQL configuration options. These can be set via `spark.conf`, `SparkSession settings`, or Spark submit arguments. +For example: + +```scala +// disabling vectorization +val spark = SparkSession.builder() + .appName("IcebergExample") + .master("local[*]") + .config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") + .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") + .config("spark.sql.iceberg.vectorization.enabled", "false") + .getOrCreate() +``` + +| Spark option | Default | Description | +|--------------------------------------------------------|----------------------------------------------------------------|------------------------------------------------------------------------| +| spark.sql.iceberg.vectorization.enabled | Table default | Enables vectorized reads of data files | +| spark.sql.iceberg.parquet.reader-type | ICEBERG | Sets Parquet reader implementation (`ICEBERG`,`COMET`) | +| spark.sql.iceberg.check-nullability | true | Whether to perform the nullability check during writes | +| spark.sql.iceberg.check-ordering | true | Whether to check the order of fields during writes | +| spark.sql.iceberg.planning.preserve-data-grouping | false | Whether to preserve the existing grouping of data while planning splits | Review Comment: How about this? ``` When true, co-locate scan tasks for the same partition in the same read split, used in Storage Partitioned Joins. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org