adutra commented on code in PR #8857:
URL: https://github.com/apache/iceberg/pull/8857#discussion_r1392574413


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##########
@@ -540,4 +585,64 @@ public void close() {
       api.close();
     }
   }
+
+  private void commitRetry(String message, Operation... ops)
+      throws BaseNessieClientServerException {
+    commitRetry(message, false, builder -> 
builder.operations(Arrays.asList(ops)));
+  }
+
+  private void commitRetry(String message, boolean retryConflicts, 
CommitEnhancer commitEnhancer)
+      throws BaseNessieClientServerException {
+    // Retry all errors except for NessieNotFoundException and also 
NessieConflictException, unless
+    // retryConflicts is set to true.
+    Predicate<Exception> shouldRetry =
+        e ->
+            !(e instanceof NessieNotFoundException)
+                && (!(e instanceof NessieConflictException) || retryConflicts);
+    Tasks.range(1)
+        .retry(5)
+        .shouldRetryTest(shouldRetry)
+        .throwFailureWhenFinished()
+        .onFailure((o, exception) -> refresh()) // FIXME is this necessary?

Review Comment:
   We would like to address it in a separate PR if that's OK with you. 
   
   There are two aspects that I would like to investigate deeper:
   
   1. Refreshing does not make all problems go away by magic: for example, it 
may make sense to refresh when the update of a namespace's properties fails due 
to conflicts, because the update has a high chance of success when retried 
after the refresh. This is what we do at line 619 below. Conversely, refreshing 
on _any_ error, like here, is likely useless.
   2. OK but you could say: refreshing the client periodically cannot do any 
harm per se, apart from wasting a few REST calls. In fact, I think it can be 
harmful: e.g. if a `NessieTableOperations` instance is holding a reference to a 
`NessieIcebergClient`, and that client is refreshed independently from the 
operations instance, the two will get out of sync. IOW, the client has a 
different HEAD then the operations, and the operations table might now be 
invalid. This situation would manifest as a `NessieBadRequestException` when 
the operation is committed.
   
   In https://github.com/apache/iceberg/pull/8909 @ajantha-bhat is adding a few 
safety nets around `doCommit` to prevent the situation outlined above. That's 
another reason why I'd like to tackle that later.
   
   But in fact I think that a more robust solution would be to create a 
mechanism to force the two components to always stay in sync, e.g. introduce a 
`RefreshListener` interface, and have the client notify all registered 
listeners when a refresh has been performed and the tip of the branch has been 
updated.
   
   Does all that make sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to