[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5967: Flink: Support read options in flink source

GitBox Mon, 05 Dec 2022 15:07:37 -0800


stevenzwu commented on code in PR #5967:
URL: https://github.com/apache/iceberg/pull/5967#discussion_r1040196996



##########
docs/flink-getting-started.md:
##########
@@ -683,7 +683,58 @@ env.execute("Test Iceberg DataStream");
 OVERWRITE and UPSERT can't be set together. In UPSERT mode, if the table is 
partitioned, the partition fields should be included in equality fields.
 {{< /hint >}}
 
-## Write options
+## Options
+### Read options
+
+Flink read options are passed when configuring the Flink IcebergSource, like 
this:
+
+```
+IcebergSource.forRowData()
+    .tableLoader(TableLoader.fromCatalog(...))
+    .assignerFactory(new SimpleSplitAssignerFactory())
+    .streaming(true)
+    
.streamingStartingStrategy(StreamingStartingStrategy.INCREMENTAL_FROM_LATEST_SNAPSHOT)
+    .startSnapshotId(3821550127947089987L)
+    .monitorInterval(Duration.ofMillis(10L)) // or .set("monitor-interval", 
"10s")
+    .build()
+```
+For Flink SQL, read options can be passed in via SQL hints like this:
+```
+SELECT * FROM tableName /*+ OPTIONS('monitor-interval'='10s') */
+...
+```
+
+Options can be passed in via Flink configuration, which will be applied to 
current session. Note that not all options support this mode.
+
+```
+env.getConfig()
+    .getConfiguration()
+    .set(FlinkReadOptions.SPLIT_FILE_OPEN_COST, 1000L);
+...
+```
+
+`Read option` has the highest priority, followed by `Flink configuration` and 
then `Table property`.
+
+| Read option                                   | Flink configuration          
                 | Table property               | Default                       
     | Description                                                  |
+| --------------------------------------------- | 
--------------------------------------------- | ---------------------------- | 
---------------------------------- | 
------------------------------------------------------------ |
+| snapshot-id                                   | N/A                          
                 | N/A                          | N/A                           
     | For time travel in batch mode. Read data from the specified snapshot-id. 
|
+| case-sensitive                                | case-sensitive               
                 | N/A                          | false                         
     | If true, match column name in a case sensitive way.          |
+| as-of-timestamp                               | N/A                          
                 | N/A                          | N/A                           
     | For time travel in batch mode. Read data from the most recent snapshot 
as of the given time in milliseconds. |
+| connector.iceberg.starting-strategy           | 
connector.iceberg.starting-strategy           | N/A                          | 
INCREMENTAL_FROM_LATEST_SNAPSHOT   | Starting strategy for streaming execution. 
TABLE_SCAN_THEN_INCREMENTAL: Do a regular table scan then switch to the 
incremental mode. The incremental mode starts from the current snapshot 
exclusive. INCREMENTAL_FROM_LATEST_SNAPSHOT: Start incremental mode from the 
latest snapshot inclusive. If it is an empty map, all future append snapshots 
should be discovered. INCREMENTAL_FROM_EARLIEST_SNAPSHOT: Start incremental 
mode from the earliest snapshot inclusive. If it is an empty map, all future 
append snapshots should be discovered. INCREMENTAL_FROM_SNAPSHOT_ID: Start 
incremental mode from a snapshot with a specific id inclusive. 
INCREMENTAL_FROM_SNAPSHOT_TIMESTAMP: Start incremental mode from a snapshot 
with a specific timestamp inclusive. If the timestamp is between two snapshots, 
it should start from the snapshot after th
 e timestamp. Just for FIP27 Source. |
+| start-snapshot-timestamp                      | N/A                          
                 | N/A                          | N/A                           
     | Start to read data from the most recent snapshot as of the given time in 
milliseconds. |
+| start-snapshot-id                             | N/A                          
                 | N/A                          | N/A                           
     | Start to read data from the specified snapshot-id.           |
+| end-snapshot-id                               | N/A                          
                 | N/A                          | The latest snapshot id        
     | Specifies the end snapshot.                                  |
+| connector.iceberg.split-size                  | connector.iceberg.split-size 
                 | read.split.target-size       | Table read.split.target-size  
     | Target size when combining data input splits.                |

Review Comment:
   hint option shouldn't need the prefix for consistency, as there is no naming 
collision concern.
   
   Default should be the default value from `ScanContext`, not `Table 
read.split.target-size `



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5967: Flink: Support read options in flink source

Reply via email to