stevenzwu commented on PR #9694: URL: https://github.com/apache/iceberg/pull/9694#issuecomment-1977646231
In general, I feel unease of changing the Flink Iceberg sink behavior from 2pc to 1pc. Would at least have more broader community input before we think it is good to add this option. Might be good to bring this up in the community sync meeting in the future. > @stevenzwu , the Pubsub operator will ack the messages in notifyCheckpointComplete() this is not guaranteed. so we may still have inconsistent. not sure if this scenario is problematic for you or not. https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/common/state/CheckpointListener.html ``` These notifications are "best effort", meaning they can sometimes be skipped. ``` > and, no handling of checkpoint corruption needed When do you encounter checkpoint corruption? Have you brought it up with the Flink community? > Keeping metrics consistent, whatever shows as acked, is actually in data I feel it is better to decouple the source and sink behavior. When Flink checkpoint completed, Iceberg sink just guarantee processed records are bookmarked/committed (not lost). The other way is not true. When data are in sync, source may not have ack'ed them. is this inconsistency a problem? if not, why is the other way a problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org