rahulsmahadev opened a new pull request, #16787:
URL: https://github.com/apache/iceberg/pull/16787

   ## Summary
   
   Follow-up to #16762, recommended by @szehon-ho in his review there.
   
   Unlike manifests, manifest-list files do not track their length in table 
metadata, so the `InputFile` handed to `ContentCache.tryCache` resolves its 
length with a remote `getFileStatus`/HEAD call. Since `tryCache` gates on 
`input.getLength() <= maxContentLength` before wrapping, every 
`BaseSnapshot.cacheManifests` call pays that round-trip — including cache hits, 
which is exactly the table-refresh/streaming-poll case that manifest-list 
caching targets.
   
   This change probes the cache for the location first and only consults 
`getLength()` for uncached locations, making cache hits round-trip-free:
   
   - The probe uses `cache.asMap().containsKey(...)` rather than 
`getIfPresent(...)` so it does not record cache statistics — a `getIfPresent` 
probe would count a hit before any read happens and skew stats consumers.
   - An entry can only be present if it previously passed the 
`maxContentLength` gate, so skipping the length check for cached locations does 
not change which files are cacheable.
   - If the entry is evicted between the probe and the read, `CachingInputFile` 
falls back to loading through the regular path, same as today.
   
   The change is independent of #16762 (`tryCache` is used for manifest caching 
on main today, where lengths are known locally and the gate is free), but it 
matters once manifest lists flow through the cache.
   
   ## Test plan
   
   New `TestContentCache` covering:
   - a cached location is wrapped without calling `getLength()` on the input 
(the test input throws if its length is resolved), and reads are served from 
the cache
   - the probe records no hit/miss stats; the subsequent read records exactly 
one hit and no miss
   - uncached locations still resolve length, and oversized files are still 
returned unwrapped and uncached
   
   Also ran `TestManifestCaching` (cache stats/size expectations unchanged) 
plus spotless locally.
   
   This pull request and its description were written by Isaac, an AI coding 
assistant (Claude Code), on behalf of @rahulsmahadev.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to