Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

via GitHub Thu, 21 Nov 2024 13:50:35 -0800


szehon-ho commented on code in PR #7732:
URL: https://github.com/apache/iceberg/pull/7732#discussion_r1852951219



##########
docs/docs/spark-configuration.md:
##########
@@ -154,43 +154,51 @@ spark.read
     .table("catalog.db.table")
 ```
 
-| Spark option    | Default               | Description                        
                                                       |
-| --------------- | --------------------- | 
-----------------------------------------------------------------------------------------
 |
-| snapshot-id     | (latest)              | Snapshot ID of the table snapshot 
to read                                                 |
-| as-of-timestamp | (latest)              | A timestamp in milliseconds; the 
snapshot used will be the snapshot current at this time. |
-| split-size      | As per table property | Overrides this table's 
read.split.target-size and read.split.metadata-target-size         |
-| lookback        | As per table property | Overrides this table's 
read.split.planning-lookback                                       |
-| file-open-cost  | As per table property | Overrides this table's 
read.split.open-file-cost                                          |
-| vectorization-enabled  | As per table property | Overrides this table's 
read.parquet.vectorization.enabled                                          |
-| batch-size  | As per table property | Overrides this table's 
read.parquet.vectorization.batch-size                                          |
-| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; 
if before the oldest known ancestor snapshot, the oldest will be used |
+Iceberg 1.8.0 and later support setting read options by Spark session 
configuration `spark.datasource.iceberg.<key>=<value>`

Review Comment:
   This can be in its own section, like "session level configuration"?



##########
docs/docs/spark-configuration.md:
##########
@@ -154,6 +154,10 @@ spark.read
     .table("catalog.db.table")
 ```
 
+Iceberg 1.8.0 and later support setting read options by Spark session 
configuration `spark.datasource.iceberg.<key>=<value>`

Review Comment:
   I still think we need new section like 'Configuration Priority' where we can 
explain the order of precedence:
   Writes:
   - explicit dataframeWriter option
   - dataframe session default
   - if table exists, explicit table option
   - if table exists, table default
   
   Reads:
   - explicit dataFrameReader option
   - dataframe session default
   - if table exists, explicit table option
   - if table exists, table default



##########
docs/docs/spark-configuration.md:
##########
@@ -167,16 +171,20 @@ spark.read
 
 ### Write options
 
-Spark write options are passed when configuring the DataFrameWriter, like this:
+Spark write options are passed when configuring the DataFrameWriterV2, like 
this:
 
 ```scala
 // write with Avro instead of Parquet
-df.write
+df.writeTo("catalog.db.table")
     .option("write-format", "avro")
     .option("snapshot-property.key", "value")
-    .insertInto("catalog.db.table")
+    .append()
 ```
 
+Iceberg 1.8.0 and later support setting write options by Spark session 
configuration `spark.datasource.iceberg.<key>=<value>`

Review Comment:
   If we extract to its own section, no need to repeat it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

Reply via email to