hililiwei commented on PR #6253:
URL: https://github.com/apache/iceberg/pull/6253#issuecomment-1330024532

   Hi @stevenzwu 
   
   > What do you mean round robin?
   
   I mean, downstream applications need to get the watermark of the table at 
intervals to determine whether to start processing. For example, get the latest 
watermark every five minutes.
   
   > How could downstream consumers know when a new partition show up?
   
   The writer determines whether to commit new partitions to the table based on 
a combination of triggers and policies. Once a new partition is committed, it 
means that the writer considers the new partition's data ready. For details: 
    
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/filesystem/#partition-commit.
   
   Downstream applications read tables using incremental read scan. When a new 
snapshot (including a new partition and data) is commited by the upstream, the 
downstream can get it.
   
   > can you elaborate the differences btw watermark and new complete partition 
for downstream consumers?
   
   Take a scenario in our current production as an example. When my table is 
partitioned by hour, if the data in [12:00,13:00) is not completely written, I 
do not want consumers to see it when querying the table. The query action may 
be an ad hoc query or a statistics task. I want to commit the snapshot after 
all the data in [12:00, 13:00) is written instead of commting the snapshot at 
each checkpoint.
   
   Thx
   Liwei


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to