adutra commented on code in PR #8857:
URL: https://github.com/apache/iceberg/pull/8857#discussion_r1392574413
##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -540,4 +585,64 @@ public void close() {
api.close();
}
}
+
+ private void commitRetry(String message, Operation... ops)
+ throws BaseNessieClientServerException {
+ commitRetry(message, false, builder ->
builder.operations(Arrays.asList(ops)));
+ }
+
+ private void commitRetry(String message, boolean retryConflicts,
CommitEnhancer commitEnhancer)
+ throws BaseNessieClientServerException {
+ // Retry all errors except for NessieNotFoundException and also
NessieConflictException, unless
+ // retryConflicts is set to true.
+ Predicate<Exception> shouldRetry =
+ e ->
+ !(e instanceof NessieNotFoundException)
+ && (!(e instanceof NessieConflictException) || retryConflicts);
+ Tasks.range(1)
+ .retry(5)
+ .shouldRetryTest(shouldRetry)
+ .throwFailureWhenFinished()
+ .onFailure((o, exception) -> refresh()) // FIXME is this necessary?
Review Comment:
We would like to address it in a separate PR if that's OK with you.
There are two aspects that I would like to investigate deeper:
1. Refreshing does not make all problems go away by magic: for example, it
may make sense to refresh when the update of a namespace's properties fails due
to conflicts, because the update has a high chance of success when retried
after the refresh. This is what we do at line 619 below. Conversely, refreshing
on _any_ error, like here, is likely useless.
2. OK but you could say: refreshing the client periodically cannot do any
harm per se, apart from wasting a few REST calls. In fact, I think it can be
harmful: e.g. if a `NessieTableOperations` instance is holding a reference to a
`NessieIcebergClient`, and that client is refreshed independently from the
operations instance, the two will get out of sync. IOW, the client has a
different HEAD then the operations, and the operations table might now be
invalid. This situation would manifest as a `NessieBadRequestException` when
the operation is committed.
In https://github.com/apache/iceberg/pull/8909 @ajantha-bhat is adding a few
safety nets around `doCommit` to prevent the situation outlined above. That's
another reason why I'd like to tackle that later.
But in fact I think that a more robust solution would be to create a
mechanism to force the two components to always stay in sync, e.g. introduce a
`RefreshListener` interface, and have the client notify all registered
listeners when a refresh has been performed and the tip of the branch has been
updated.
Does all that make sense?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]