stevenzwu commented on PR #6253:
URL: https://github.com/apache/iceberg/pull/6253#issuecomment-1329430785

   > If our community accepts this PR solution, I would like to do one more 
thing, which is to support time-based partition commit. In some scenarios, when 
a new partition is written, it is usually necessary to notify the downstream 
application. For example, When all the data for this partition is written, 
commit this partition to iceberg, just as flink does for 
[hive\filesystem](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/filesystem/#partition-commit).
 I once participated in the development of this part of Flink, and I hope to 
introduce it to iceberg sink. Because of iceberg's snapshot management feature, 
we may be able to do better than hive\filesystem.
   
   > The current iceberg flink sink can only commit based on checkpoint. When 
the time-based commit is complete, it will provides a partition commit feature 
that allows configuring custom policies. Commit actions are based on a 
combination of triggers and policies.
   
   @hililiwei time-based partition commit seems quite complicated. trying to 
understand its value. With watermark info to mark the data completeness, 
downstream can decide which partition (hourly or daily) has the complete data 
and it is ok to trigger the processing of the completed hour or day. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to