jbbqqf opened a new pull request, #16266:
URL: https://github.com/apache/iceberg/pull/16266
## Summary
`OSSInputStream` (aliyun) and `EcsSeekableInputStream` (dell) had the same
EOF-handling bug fixed in #16055 for `GCSInputStream` / `S3InputStream` /
`ADLSInputStream`: when the underlying `read()` returns `-1`, the wrapper still
advanced `pos` (and `next`, in OSS's case) and incremented the `readBytes` /
`readOperations` metric counters by `-1`.
This PR applies the same one-shape fix the maintainers picked in #16055 to
the two modules left out of that PR, plus regression tests covering both
overloads.
Resolves #16062 for `aliyun` and `dell`. The remaining pieces (`GCS`, `S3`,
`ADLS`) are still tracked by the in-flight #16055.
## Context
The original issue body for #16062 explicitly enumerates the four affected
files; #16055 covers `GCS`/`S3`/`ADLS` and the issue comment confirms
`OSSInputStream` and `EcsSeekableInputStream` are still pending.
In Iceberg's hot path, files are read via known-offset range reads rather
than sequential reads-to-EOF, so the bug is *latent* in production today. It
still matters because:
1. Any future caller that does loop until `read() == -1` would silently
corrupt `pos`/`next` and the metric counters.
2. The behavior diverges from `InputStream`'s contract — every other Iceberg
`*InputStream` already returns `-1` cleanly after #16055.
3. The metric drift is observable today via `FileIOMetricsContext` snapshots
that overshoot the file size.
## Changes
- `aliyun/src/main/java/org/apache/iceberg/aliyun/oss/OSSInputStream.java`
- `read()` and `read(byte[], int, int)` now check the underlying return
value and, if `-1`, return immediately without mutating `pos` / `next` or
incrementing `readBytes` / `readOperations`. Same pattern as #16055's
`S3InputStream` change.
- Inline comments explain the why and reference both #16055 (the canonical
fix) and #16062 (this issue), so a reviewer reading the diff cold doesn't have
to re-derive the reasoning.
-
`dell/src/main/java/org/apache/iceberg/dell/ecs/EcsSeekableInputStream.java` —
same shape.
-
`aliyun/src/test/java/org/apache/iceberg/aliyun/oss/TestOSSInputStream.java` —
adds `testReadSingleByteAtEof` and `testReadBufferAtEof`. Both run against the
in-process `AliyunOSSMockExtension` (no live OSS access required).
-
`dell/src/test/java/org/apache/iceberg/dell/ecs/TestEcsSeekableInputStream.java`
— same two tests against the existing `EcsS3MockRule`.
All four new tests fail on `origin/main` and pass with this change.
## Reproduce BEFORE/AFTER yourself (copy-paste)
```bash
# --- one-time setup ---
git clone https://github.com/apache/iceberg.git /tmp/repro && cd /tmp/repro
# --- BEFORE (origin/main) ---
git checkout origin/main
# Pull in only the new test methods so the regression is visible against
# the unmodified production code in main:
git fetch https://github.com/jbbqqf/iceberg.git
feat/16062-fix-eof-handling-oss-ecs-streams
git checkout FETCH_HEAD -- \
aliyun/src/test/java/org/apache/iceberg/aliyun/oss/TestOSSInputStream.java
\
dell/src/test/java/org/apache/iceberg/dell/ecs/TestEcsSeekableInputStream.java
./gradlew :iceberg-dell:test --tests
"org.apache.iceberg.dell.ecs.TestEcsSeekableInputStream"
./gradlew :iceberg-aliyun:test --tests
"org.apache.iceberg.aliyun.oss.TestOSSInputStream"
# Expected: 2 failures in each test class —
# testReadSingleByteAtEof (pos / readBytes shifted by -1 at EOF)
# testReadBufferAtEof (same, for the buffered overload)
# --- AFTER (this PR) ---
git checkout FETCH_HEAD
./gradlew :iceberg-dell:test --tests
"org.apache.iceberg.dell.ecs.TestEcsSeekableInputStream"
./gradlew :iceberg-aliyun:test --tests
"org.apache.iceberg.aliyun.oss.TestOSSInputStream"
# Expected: all tests green in both modules.
```
The only thing that changes between BEFORE and AFTER is the checked-out ref
of the production InputStream classes; the test files are the same in both runs.
## What I ran locally
- `./gradlew :iceberg-dell:test --tests "...TestEcsSeekableInputStream"` on
`origin/main` with the new tests imported → **6 tests, 2 failures** (regression
visible).
- Same on this branch → **6/6 pass**.
- `./gradlew :iceberg-aliyun:test --tests "...TestOSSInputStream"` on
`origin/main` with the new tests imported → **5 tests, 2 failures**.
- Same on this branch → **5/5 pass**.
- `./gradlew :iceberg-aliyun:spotlessCheck :iceberg-dell:spotlessCheck` →
**PASS**.
## Edge cases tested
| # | Module | Scenario | Expected | Verified by |
|---|--------|----------|----------|-------------|
| 1 | dell | Single-byte `read()` past the last byte of a 4-byte object |
returns `-1`; `getPos()` stays at file size; idempotent on a second call |
`TestEcsSeekableInputStream#testReadSingleByteAtEof` |
| 2 | dell | Buffered `read(byte[], 0, len)` past EOF | returns `-1`;
`getPos()` stays at file size |
`TestEcsSeekableInputStream#testReadBufferAtEof` |
| 3 | aliyun | Single-byte `read()` past EOF on an 8-byte object | returns
`-1`; `getPos()` stays; idempotent |
`TestOSSInputStream#testReadSingleByteAtEof` |
| 4 | aliyun | Buffered `read(byte[], 0, dataSize+1)` followed by another
EOF buffered read | first read returns `dataSize`, second returns `-1`;
`getPos()` stays at `dataSize` | `TestOSSInputStream#testReadBufferAtEof` |
## Risk / blast radius
- Two-line guard in two `*InputStream` classes, mirroring the GCS/S3/ADLS
shape that was already chosen by maintainers in #16055.
- No public API surface modified.
- Behavioral change is strictly *more* contract-compliant:
`InputStream.read()` is documented to return `-1` at EOF without side effects,
and that is now what these wrappers do.
- Metrics: `readBytes`/`readOperations` no longer drift on EOF reads.
Pre-fix snapshots overcounted reads; post-fix they don't. This is observable
but should be considered a fix, not a regression.
## Release note
```release-note
Fix OSS (aliyun) and ECS (dell) InputStreams to return -1 at EOF without
advancing the position counter or skewing read metrics, matching the
GCS/S3/ADLS fix in #16055.
```
---
*PR drafted with assistance from Claude Code. The change was reviewed
manually against `apache/iceberg`'s source on `main` and against the
corresponding fix shape in #16055. The reproducer block above was used during
development; it is the same one a reviewer can paste verbatim.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]