szehon-ho commented on code in PR #7732: URL: https://github.com/apache/iceberg/pull/7732#discussion_r1852951219
########## docs/docs/spark-configuration.md: ########## @@ -154,43 +154,51 @@ spark.read .table("catalog.db.table") ``` -| Spark option | Default | Description | -| --------------- | --------------------- | ----------------------------------------------------------------------------------------- | -| snapshot-id | (latest) | Snapshot ID of the table snapshot to read | -| as-of-timestamp | (latest) | A timestamp in milliseconds; the snapshot used will be the snapshot current at this time. | -| split-size | As per table property | Overrides this table's read.split.target-size and read.split.metadata-target-size | -| lookback | As per table property | Overrides this table's read.split.planning-lookback | -| file-open-cost | As per table property | Overrides this table's read.split.open-file-cost | -| vectorization-enabled | As per table property | Overrides this table's read.parquet.vectorization.enabled | -| batch-size | As per table property | Overrides this table's read.parquet.vectorization.batch-size | -| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; if before the oldest known ancestor snapshot, the oldest will be used | +Iceberg 1.8.0 and later support setting read options by Spark session configuration `spark.datasource.iceberg.<key>=<value>` Review Comment: This can be in its own section, like "session level configuration"? ########## docs/docs/spark-configuration.md: ########## @@ -154,6 +154,10 @@ spark.read .table("catalog.db.table") ``` +Iceberg 1.8.0 and later support setting read options by Spark session configuration `spark.datasource.iceberg.<key>=<value>` Review Comment: I still think we need new section like 'Configuration Priority' where we can explain the order of precedence: Writes: - explicit dataframeWriter option - dataframe session default - if table exists, explicit table option - if table exists, table default Reads: - explicit dataFrameReader option - dataframe session default - if table exists, explicit table option - if table exists, table default ########## docs/docs/spark-configuration.md: ########## @@ -167,16 +171,20 @@ spark.read ### Write options -Spark write options are passed when configuring the DataFrameWriter, like this: +Spark write options are passed when configuring the DataFrameWriterV2, like this: ```scala // write with Avro instead of Parquet -df.write +df.writeTo("catalog.db.table") .option("write-format", "avro") .option("snapshot-property.key", "value") - .insertInto("catalog.db.table") + .append() ``` +Iceberg 1.8.0 and later support setting write options by Spark session configuration `spark.datasource.iceberg.<key>=<value>` Review Comment: If we extract to its own section, no need to repeat it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org