adutra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1392574413
########## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ########## @@ -540,4 +585,64 @@ public void close() { api.close(); } } + + private void commitRetry(String message, Operation... ops) + throws BaseNessieClientServerException { + commitRetry(message, false, builder -> builder.operations(Arrays.asList(ops))); + } + + private void commitRetry(String message, boolean retryConflicts, CommitEnhancer commitEnhancer) + throws BaseNessieClientServerException { + // Retry all errors except for NessieNotFoundException and also NessieConflictException, unless + // retryConflicts is set to true. + Predicate<Exception> shouldRetry = + e -> + !(e instanceof NessieNotFoundException) + && (!(e instanceof NessieConflictException) || retryConflicts); + Tasks.range(1) + .retry(5) + .shouldRetryTest(shouldRetry) + .throwFailureWhenFinished() + .onFailure((o, exception) -> refresh()) // FIXME is this necessary? Review Comment: We would like to address it in a separate PR if that's OK with you. There are two aspects that I would like to investigate deeper: 1. Refreshing does not make all problems go away by magic: for example, it may make sense to refresh when the update of a namespace's properties fails due to conflicts, because the update has a high chance of success when retried after the refresh. This is what we do at line 619 below. Conversely, refreshing on _any_ error, like here, is likely useless. 2. OK but you could say: refreshing the client periodically cannot do any harm per se, apart from wasting a few REST calls. In fact, I think it can be harmful: e.g. if a `NessieTableOperations` instance is holding a reference to a `NessieIcebergClient`, and that client is refreshed independently from the operations instance, the two will get out of sync. IOW, the client has a different HEAD then the operations, and the operations table might now be invalid. This situation would manifest as a `NessieBadRequestException` when the operation is committed. In https://github.com/apache/iceberg/pull/8909 @ajantha-bhat is adding a few safety nets around `doCommit` to prevent the situation outlined above. That's another reason why I'd like to tackle that later. But in fact I think that a more robust solution would be to create a mechanism to force the two components to always stay in sync, e.g. introduce a `RefreshListener` interface, and have the client notify all registered listeners when a refresh has been performed and the tip of the branch has been updated. Does all that make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org