shanzi opened a new issue, #15867: URL: https://github.com/apache/iceberg/issues/15867
### Apache Iceberg version 1.10.1 (latest release) ### Query engine Flink ### Please describe the bug 🐞 In the [watermark emitter](https://github.com/apache/iceberg/blob/3e22f850bd21f3f8f4ecb340a5d4ffe468fb1ced/flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/source/reader/WatermarkExtractorRecordEmitter.java#L57), it seems we are emitting min timestamp extracted in the [ColumnStatsWatermarkExtractor](https://github.com/apache/iceberg/blob/3e22f850bd21f3f8f4ecb340a5d4ffe468fb1ced/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/source/reader/ColumnStatsWatermarkExtractor.java#L39). However, in Flink watermark means events less than or *equal to* the value has already been processed. Thus, if there are many records just have the minimum timestamp in the split and there is a subsequent stateful operator that will drop late records. Those records will be treated as late. Thus, we should emit min metrics minus one for every split. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
