rahulsmahadev opened a new pull request, #16787: URL: https://github.com/apache/iceberg/pull/16787
## Summary Follow-up to #16762, recommended by @szehon-ho in his review there. Unlike manifests, manifest-list files do not track their length in table metadata, so the `InputFile` handed to `ContentCache.tryCache` resolves its length with a remote `getFileStatus`/HEAD call. Since `tryCache` gates on `input.getLength() <= maxContentLength` before wrapping, every `BaseSnapshot.cacheManifests` call pays that round-trip — including cache hits, which is exactly the table-refresh/streaming-poll case that manifest-list caching targets. This change probes the cache for the location first and only consults `getLength()` for uncached locations, making cache hits round-trip-free: - The probe uses `cache.asMap().containsKey(...)` rather than `getIfPresent(...)` so it does not record cache statistics — a `getIfPresent` probe would count a hit before any read happens and skew stats consumers. - An entry can only be present if it previously passed the `maxContentLength` gate, so skipping the length check for cached locations does not change which files are cacheable. - If the entry is evicted between the probe and the read, `CachingInputFile` falls back to loading through the regular path, same as today. The change is independent of #16762 (`tryCache` is used for manifest caching on main today, where lengths are known locally and the gate is free), but it matters once manifest lists flow through the cache. ## Test plan New `TestContentCache` covering: - a cached location is wrapped without calling `getLength()` on the input (the test input throws if its length is resolved), and reads are served from the cache - the probe records no hit/miss stats; the subsequent read records exactly one hit and no miss - uncached locations still resolve length, and oversized files are still returned unwrapped and uncached Also ran `TestManifestCaching` (cache stats/size expectations unchanged) plus spotless locally. This pull request and its description were written by Isaac, an AI coding assistant (Claude Code), on behalf of @rahulsmahadev. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
