AlJohri opened a new pull request, #2455:
URL: https://github.com/apache/iceberg-rust/pull/2455

   ## Which issue does this PR close?
   
   - Closes #1288 (the prior user report; auto-closed by stale bot in Nov 2025 
without engagement)
   
   ## What changes are included in this PR?
   
   `iceberg-storage-opendal::OpenDalStorage::create_operator` currently wraps 
the built operator with `RetryLayer::new()` and no `TimeoutLayer`. Per 
[opendal's 
docs](https://opendal.apache.org/docs/rust/opendal/layers/struct.TimeoutLayer.html):
   
   > While using `TimeoutLayer` with `RetryLayer` at the same time, please make 
sure timeout layer showed up before retry layer. Since timeout layer will drop 
future, leaving retry layer in a bad state.
   
   The inverse — `RetryLayer` without `TimeoutLayer` — is exactly the broken 
composition this crate ships today. `RetryLayer` only retries when its inner 
future returns `Err`. A future parked indefinitely on a silently dropped TCP 
connection never produces any error, so `RetryLayer` never fires and the caller 
hangs forever.
   
   This PR adds `TimeoutLayer::new()` inside `RetryLayer`. opendal's defaults 
(60 s for non-IO ops like `stat`/`list`/`delete`, 10 s per IO chunk for 
`read`/`write`) are used; each retry attempt is now independently bounded, 
transient hangs surface as a clean error to `RetryLayer` which retries with 
backoff, and unrecoverable hangs propagate a real error to the caller in 
seconds rather than the inner future parking forever.
   
   The diff is two lines in `crates/storage/opendal/src/lib.rs` (the import and 
the layer composition) plus an updated comment explaining the ordering 
invariant.
   
   ### How we hit this
   
   In production: a Rust application using 
`iceberg-storage-opendal::OpenDalStorageFactory::S3` to read iceberg tables on 
AWS hung for 24 hours when iceberg `try_next()` returned a `Pending` future 
whose underlying opendal Range-GET against S3 never completed. Core-dump 
analysis showed:
   
   - Two in-flight HTTP/1.1 Range-GETs in heap (one for ~723 KB, one for ~367 
KB), both signed with valid temporary credentials.
   - No active TCP connection to any S3 IP at the time of the dump 
(`/proc/<pid>/net/tcp` had only the OTel collector socket).
   - gdb backtraces of all 35 threads showed the tokio runtime fully idle: 
workers parked in `Condvar::wait_until_internal`, main thread in 
`Runtime::block_on`.
   
   So the response future was permanently `Pending` after the TCP connection 
silently died, with no error to propagate. The `RetryLayer` was in the chain 
but dormant because there was no error to react to. Adding `TimeoutLayer` would 
have produced a `MinimumThroughputBelowThresholdError`/timeout `Err` within 
seconds, `RetryLayer` would have retried with backoff, and the operation would 
have surfaced cleanly within ~90 s instead of hanging until the pod's 
`activeDeadlineSeconds` killed it 24 h later.
   
   ### Context on the original composition
   
   `RetryLayer::new()` was added in #788 (Dec 2024) to bound transient 
`"connection closed before message completed"` errors. That PR's description 
explicitly noted that configurability could be a follow-up — and didn't include 
a `TimeoutLayer`. The result is the failure mode above: retry works only for 
fast-failing errors and is dead code for the silent-hang case, which is the 
much harder bug to debug.
   
   A user filed #1288 in May 2025 asking for IO-operation timeout support; it 
received no maintainer engagement and was auto-closed by the stale bot. This PR 
closes that issue with the minimal change needed to restore the composition 
opendal recommends.
   
   ## Are these changes tested?
   
   The existing test suite passes locally (`cargo check -p 
iceberg-storage-opendal`, `cargo clippy -p iceberg-storage-opendal --no-deps`, 
`cargo +nightly fmt -p iceberg-storage-opendal --check`).
   
   There is no integration test in this crate that exercises a hung-connection 
scenario (would require a mock S3 that accepts a connection and never 
responds). Happy to add one if reviewers want — a `tokio::net::TcpListener` 
that accepts and parks would be straightforward.
   
   ## Notes
   
   - No API change. No new builder methods, no new fields, no breaking changes 
for current users.
   - The same one-line composition runs for all storage variants (Memory, Fs, 
S3, Gcs, Oss, Azdls). `TimeoutLayer` applies uniformly. For in-memory/fs 
backends the timeout is effectively never hit; the cost is negligible. For 
network backends it is the actual fix.
   - If a user genuinely needs longer-than-default bounds (e.g. fetching very 
large files over a slow link), the post-#1885 approach is to implement 
`iceberg::Storage` themselves and inject their preferred layer stack — but the 
default path should not hang silently, which is what this PR addresses. Happy 
to expand into a configurable form (e.g. accept `TimeoutLayer` on the 
`OpenDalStorageFactory::S3` variant) in review if maintainers prefer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to