Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

via GitHub Wed, 20 Nov 2024 10:38:52 -0800


szehon-ho commented on code in PR #7732:
URL: https://github.com/apache/iceberg/pull/7732#discussion_r1850809250



##########
docs/docs/spark-configuration.md:
##########
@@ -154,43 +154,51 @@ spark.read
     .table("catalog.db.table")
 ```
 
-| Spark option    | Default               | Description                        
                                                       |
-| --------------- | --------------------- | 
-----------------------------------------------------------------------------------------
 |
-| snapshot-id     | (latest)              | Snapshot ID of the table snapshot 
to read                                                 |
-| as-of-timestamp | (latest)              | A timestamp in milliseconds; the 
snapshot used will be the snapshot current at this time. |
-| split-size      | As per table property | Overrides this table's 
read.split.target-size and read.split.metadata-target-size         |
-| lookback        | As per table property | Overrides this table's 
read.split.planning-lookback                                       |
-| file-open-cost  | As per table property | Overrides this table's 
read.split.open-file-cost                                          |
-| vectorization-enabled  | As per table property | Overrides this table's 
read.parquet.vectorization.enabled                                          |
-| batch-size  | As per table property | Overrides this table's 
read.parquet.vectorization.batch-size                                          |
-| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; 
if before the oldest known ancestor snapshot, the oldest will be used |
+Iceberg 1.8.0 and later support setting read options by Spark session 
configuration `spark.datasource.iceberg.<key>=<value>`

Review Comment:
   I think this is good, but was also thinking of adding a section for priority 
as well as mentioned.



##########
docs/docs/spark-configuration.md:
##########
@@ -154,43 +154,51 @@ spark.read
     .table("catalog.db.table")
 ```
 
-| Spark option    | Default               | Description                        
                                                       |
-| --------------- | --------------------- | 
-----------------------------------------------------------------------------------------
 |
-| snapshot-id     | (latest)              | Snapshot ID of the table snapshot 
to read                                                 |
-| as-of-timestamp | (latest)              | A timestamp in milliseconds; the 
snapshot used will be the snapshot current at this time. |
-| split-size      | As per table property | Overrides this table's 
read.split.target-size and read.split.metadata-target-size         |
-| lookback        | As per table property | Overrides this table's 
read.split.planning-lookback                                       |
-| file-open-cost  | As per table property | Overrides this table's 
read.split.open-file-cost                                          |
-| vectorization-enabled  | As per table property | Overrides this table's 
read.parquet.vectorization.enabled                                          |
-| batch-size  | As per table property | Overrides this table's 
read.parquet.vectorization.batch-size                                          |
-| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; 
if before the oldest known ancestor snapshot, the oldest will be used |
+Iceberg 1.8.0 and later support setting read options by Spark session 
configuration `spark.datasource.iceberg.<key>=<value>`
+when using DataFrame to read Iceberg tables, for example: 
`spark.datasource.iceberg.split-size=512m`, it has lower priority
+than options explicitly passed to DataFrameReader.
+
+| Spark option          | Default               | Description                  
                                                                                
     |

Review Comment:
   I think we can revert change to this table?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

Reply via email to