stevenzwu commented on PR #7638:
URL: https://github.com/apache/iceberg/pull/7638#issuecomment-1553950220

   > the value of the watermark may decrease due to flink job restart or data 
replay
   
   This can be fixed by checkpointing the watermark written to snapshot summary 
or during restore the committer can retrieve the latest committed watermark
   
   > This makes it impossible for downstream application to directly determine 
which partitions are visible, and they need to calculate it themselves based on 
the watermark and the each partition time.
   
   I agree that a little bit of logic is needed to determine which partitions 
have complete data based on the published watermark in snapshot summary.
   
   > This PR maintains high compatibility with the Flink ecosystem. It uses the 
_SUCCESS file as a marker to indicate partition commit 
(https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/filesystem/#partition-commit),
   
   I am not sure this is a fair comparison. Flink filesystem connector is 
storing files on distributed file system (like S3) directly. there is no table 
format abstraction. hence success file is the only option.
   
   > sink.partition-commit.success-file.name
   
   where are those success files stored? How do downstream consumers find them? 
   
   How does work with entropy enabled?
   https://iceberg.apache.org/docs/latest/aws/#object-store-file-layout
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to