pvary commented on code in PR #9179: URL: https://github.com/apache/iceberg/pull/9179#discussion_r1410606706
########## docs/flink-queries.md: ########## @@ -277,6 +277,58 @@ DataStream<Row> stream = env.fromSource(source, WatermarkStrategy.noWatermarks() "Iceberg Source as Avro GenericRecord", new GenericRecordAvroTypeInfo(avroSchema)); ``` +### Emitting watermarks +Emitting watermarks from the source itself could be beneficial for several purposes, like harnessing the +[Flink Watermark Alignment](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment) +feature to prevent runaway readers, or providing triggers for [Flink windowing](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/operators/windows/). Review Comment: I think it is very important to understand, that windowing and watermark generation based on records could cause surprising results - especially with batch reads, or in backfill situations. Without this feature there is not guarantee on the order of the files are read. Window triggering will only become reliable when the source controls the emitted watermarks. I am not sure how detailed the description should be, but I think it is important to be noted here, so I am open for suggestions, if you think we should add more detail here. ########## docs/flink-queries.md: ########## @@ -277,6 +277,58 @@ DataStream<Row> stream = env.fromSource(source, WatermarkStrategy.noWatermarks() "Iceberg Source as Avro GenericRecord", new GenericRecordAvroTypeInfo(avroSchema)); ``` +### Emitting watermarks +Emitting watermarks from the source itself could be beneficial for several purposes, like harnessing the +[Flink Watermark Alignment](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment) +feature to prevent runaway readers, or providing triggers for [Flink windowing](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/operators/windows/). + +Enable watermark generation for an `IcebergSource` by setting the `watermarkColumn`. +The supported column types are `timestamp`, `timestamptz` and `long`. +Timestamp columns are automatically converted to milliseconds since the Java epoch of +1970-01-01T00:00:00Z. Use `watermarkTimeUnit` to configure the conversion for long columns. Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org