laskoviymishka commented on PR #16237: URL: https://github.com/apache/iceberg/pull/16237#issuecomment-4468228356
that's quite a swing @yadavay-amzn, and the new strategy changed enough that I need to flip to request some changes / questions. The new behavior is basically: > “If a commit fails with an Iceberg-shaped exception, WARN and hope the next cycle fixes it. Only fail on unexpected exceptions.” I don’t think that’s safe. The previous direction was closer to correct: > “If the real commit fails, kill the coordinator. If cleanup/partial commit fails, swallow it.” Main issues: 1. `CommitStateUnknownException` is now swallowed. That one explicitly says retrying can create duplicates. Since the same files can still be retried from `commitBuffer`, this is exactly the bad case the exception warns about. 2. `CleanableFailure` is the wrong boundary. It means “metadata can be cleaned up,” not “safe to retry.” It also includes permanent errors like 400/401/403. With this change, bad credentials can loop forever at WARN, which is pretty close to the original bug. 3. `CommitFailedException` is still swallowed. That’s the exact exception from #15878. The “retry next cycle” story depends on a pretty fragile buffer/offset ordering assumption, and if a rebalance happens after offsets are committed, the in-memory state is gone and the failure is invisible. I’d rather go back to the old shape: rethrow on full commit failure, swallow only the timed-out partial commit cleanup failure. If we really want retries here, I think we need to explicitly list retryable exception types, bound the retries, and document the buffer lifecycle. Also the PR description drifted from the code again, so that needs an update too. @AnatolyPopov — still interested in your read on the worker/coordinator offset ordering, since that’s the key assumption behind “retry next cycle.” -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
