szehon-ho commented on code in PR #7732: URL: https://github.com/apache/iceberg/pull/7732#discussion_r1850809250
########## docs/docs/spark-configuration.md: ########## @@ -154,43 +154,51 @@ spark.read .table("catalog.db.table") ``` -| Spark option | Default | Description | -| --------------- | --------------------- | ----------------------------------------------------------------------------------------- | -| snapshot-id | (latest) | Snapshot ID of the table snapshot to read | -| as-of-timestamp | (latest) | A timestamp in milliseconds; the snapshot used will be the snapshot current at this time. | -| split-size | As per table property | Overrides this table's read.split.target-size and read.split.metadata-target-size | -| lookback | As per table property | Overrides this table's read.split.planning-lookback | -| file-open-cost | As per table property | Overrides this table's read.split.open-file-cost | -| vectorization-enabled | As per table property | Overrides this table's read.parquet.vectorization.enabled | -| batch-size | As per table property | Overrides this table's read.parquet.vectorization.batch-size | -| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; if before the oldest known ancestor snapshot, the oldest will be used | +Iceberg 1.8.0 and later support setting read options by Spark session configuration `spark.datasource.iceberg.<key>=<value>` Review Comment: I think this is good, but was also thinking of adding a section for priority as well as mentioned. ########## docs/docs/spark-configuration.md: ########## @@ -154,43 +154,51 @@ spark.read .table("catalog.db.table") ``` -| Spark option | Default | Description | -| --------------- | --------------------- | ----------------------------------------------------------------------------------------- | -| snapshot-id | (latest) | Snapshot ID of the table snapshot to read | -| as-of-timestamp | (latest) | A timestamp in milliseconds; the snapshot used will be the snapshot current at this time. | -| split-size | As per table property | Overrides this table's read.split.target-size and read.split.metadata-target-size | -| lookback | As per table property | Overrides this table's read.split.planning-lookback | -| file-open-cost | As per table property | Overrides this table's read.split.open-file-cost | -| vectorization-enabled | As per table property | Overrides this table's read.parquet.vectorization.enabled | -| batch-size | As per table property | Overrides this table's read.parquet.vectorization.batch-size | -| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; if before the oldest known ancestor snapshot, the oldest will be used | +Iceberg 1.8.0 and later support setting read options by Spark session configuration `spark.datasource.iceberg.<key>=<value>` +when using DataFrame to read Iceberg tables, for example: `spark.datasource.iceberg.split-size=512m`, it has lower priority +than options explicitly passed to DataFrameReader. + +| Spark option | Default | Description | Review Comment: I think we can revert change to this table? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org