rahulsmahadev opened a new pull request, #16762:
URL: https://github.com/apache/iceberg/pull/16762

   ## Summary
   
   When `io.manifest.cache.enabled` is set, manifest files are served through 
`ContentCache`, but manifest **list** files are not — 
`BaseSnapshot.cacheManifests` reads the list with a raw `FileIO.newInputFile` 
call. Every freshly loaded `Snapshot` (table refresh, new table handle, 
streaming poll) therefore re-fetches the same immutable manifest-list file from 
object storage, even when the content cache is enabled and warm.
   
   This change routes the manifest-list read through the same content cache 
used for manifests:
   
   - `ManifestFiles.newInputFile(FileIO, ManifestListFile)` — package-private 
twin of the existing `newInputFile(FileIO, ManifestFile)` helper: wraps the 
input with `ContentCache.tryCache` when caching is enabled for the `FileIO`.
   - `BaseSnapshot.cacheManifests` uses it for the manifest-list read.
   
   Manifest lists are immutable and location-unique like manifests, so the 
existing cache keying and invalidation semantics apply unchanged. Encrypted 
manifest lists follow the same contract as manifests today: the `FileIO` (e.g. 
`EncryptingFileIO`) controls decryption, and caching behavior is identical to 
the manifest path.
   
   ## Test plan
   
   - New `TestManifestCaching.testManifestListCaching`: a freshly parsed 
snapshot loads the manifest list through the cache (miss + cache-size 
increase), and a second snapshot instance reading the same list is served from 
the cache (no new miss, hit count increases).
   - Updated `testPlanWithCache` expectations: with the change, each append 
commit also caches one manifest list (parent lists are read while committing, 
and the current snapshot's list while planning), so the cache holds `numFiles * 
2` entries.
   - Verified locally: `TestManifestCaching` (6/6), `TestManifestListVersions`, 
`TestSnapshot`, `TestCommitReporting`, `TestManifestListEncryption`, plus 
spotless and checkstyle.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to