mudit-97 commented on PR #9694:
URL: https://github.com/apache/iceberg/pull/9694#issuecomment-1969366948

   sure @pvary, we wanted single phase commit solution because of this thought 
process:
   
   1. We are writing a Pubsub source operator which will ack the message on 
notifyCheckpointComplete
   2. If 2PC is used, then notifyCheckpointComplete will be called parallely 
and there is no guarantee the messages which are acked in PubSub are even 
written to Iceberg or not, they might still be in the checkpoint directory
   3. If during any time, job goes down we have to take care of managing the 
checkpoints always and resuming job from checkpoint. If checkpoints are 
corrupted, we will have to seek back the PubSub operator
   4. Apart from all of this, PubSub metrics / any source operator metric will 
never give a consistent view as acked messages can still be lying in checkpoint 
directory instead of lying in sink
   
   We understand there can be duplication of messages in this case, but for 
some cases we believe duplication would be okay instead of managing checkpoints 
and taking care of corruption in them and also maintaining consistent metrics 
along the way especially metrics like watermarks
   
   Thats why we wanted to keep this behavior behind a flag so that consumers 
can choose to have it if needed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to