AlJohri commented on code in PR #2455:
URL: https://github.com/apache/iceberg-rust/pull/2455#discussion_r3324841457


##########
crates/storage/opendal/src/lib.rs:
##########
@@ -326,9 +326,17 @@ impl OpenDalStorage {
             }
         };
 
-        // Transient errors are common for object stores; however there's no
-        // harm in retrying temporary failures for other storage backends as 
well.
-        let operator = operator.layer(RetryLayer::new());
+        // Apply observability/resilience layers. TimeoutLayer must be
+        // inside RetryLayer so each retry attempt is independently
+        // bounded — without a per-attempt timeout, a future parked on a
+        // silently dropped TCP connection never produces an `Err` and
+        // RetryLayer cannot retry, leaving the caller hung indefinitely.
+        // See: 
https://opendal.apache.org/docs/rust/opendal/layers/struct.TimeoutLayer.html
+        //
+        // Transient errors are common for object stores; we retry temporary
+        // failures with exponential backoff. The retry behavior also
+        // benefits non-object-store backends.
+        let operator = 
operator.layer(TimeoutLayer::new()).layer(RetryLayer::new());

Review Comment:
   In this PR I haven't made it configurable at all and just used the OpenDAL 
defaults but I can definitely update the PR to make it configurable if that's 
desirable.
   
   I was originally thinking it may be worth just adding the TimeoutLayer in 
place with the existing defaults similar to the RetryLayer and doing 
configurability changes in a follow up, but happy to do it now as well.
   
   I didn't see any knobs for configuring the object level retries and timeouts 
in the table behavior properties link, but looking at other implementations, I 
could add:
   
   - `s3.connect-timeout` ([matches 
pyiceberg](https://github.com/apache/iceberg-python/blob/5da8186d0a77456e2da2a0286bc016af94c043ed/pyiceberg/io/__init__.py#L60)
 and 
[pyarrow](https://github.com/apache/iceberg-python/blob/5da8186d0a77456e2da2a0286bc016af94c043ed/pyiceberg/io/pyarrow.py#L454-L455))
   - `s3.request-timeout` ([matches 
pyiceberg](https://github.com/apache/iceberg-python/blob/5da8186d0a77456e2da2a0286bc016af94c043ed/pyiceberg/io/__init__.py#L61)
 and 
[pyarrow](https://github.com/apache/iceberg-python/blob/5da8186d0a77456e2da2a0286bc016af94c043ed/pyiceberg/io/pyarrow.py#L457-L458))
   - `s3.retry.num-retries` ([matches 
java](https://github.com/apache/iceberg/blob/0052942699cd7b5e098d54a958827a911d28ac94/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L475-L478))
   - `s3.retry.min-wait-ms` ([matches 
java](https://github.com/apache/iceberg/blob/0052942699cd7b5e098d54a958827a911d28ac94/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L480-L483))
   - `s3.retry.max-wait-ms` ([matches 
java](https://github.com/apache/iceberg/blob/0052942699cd7b5e098d54a958827a911d28ac94/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L485-L488))



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to