stevenzwu commented on PR #7638: URL: https://github.com/apache/iceberg/pull/7638#issuecomment-1553950220
> the value of the watermark may decrease due to flink job restart or data replay This can be fixed by checkpointing the watermark written to snapshot summary or during restore the committer can retrieve the latest committed watermark > This makes it impossible for downstream application to directly determine which partitions are visible, and they need to calculate it themselves based on the watermark and the each partition time. I agree that a little bit of logic is needed to determine which partitions have complete data based on the published watermark in snapshot summary. > This PR maintains high compatibility with the Flink ecosystem. It uses the _SUCCESS file as a marker to indicate partition commit (https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/filesystem/#partition-commit), I am not sure this is a fair comparison. Flink filesystem connector is storing files on distributed file system (like S3) directly. there is no table format abstraction. hence success file is the only option. > sink.partition-commit.success-file.name where are those success files stored? How do downstream consumers find them? How does work with entropy enabled? https://iceberg.apache.org/docs/latest/aws/#object-store-file-layout -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
