Xuanwo opened a new issue, #1226: URL: https://github.com/apache/iceberg-rust/issues/1226
### What's the feature are you trying to implement? Cache is an essential component of an Iceberg table, and different types of cache are needed at various levels. For example, for our table metadata, we will need a `Manifest` cache so that we don't have to read and deserialize the same manifest files repeatedly. For our Parquet files, we will need a `FileMetadata` cache to avoid parsing the metadata from the Parquet files each time. We could even implement a raw data cache to store portions of data files, eliminating the need to download them from S3 again. As the foundation for various query engines, iceberg-rust should be designed to simplify integration while still allowing each engine to fully optimize performance. This applies whether they are using iceberg-rust on a single machine or within a distributed cluster. I plan to add a set of cache APIs to meet all those needs. My current plan is: - `ObjectCache`: an object cache trait that can hold objects like `Manifest` or `FileMetadata` - `BytesCache`: a bytes cache that can hold row content of files, like `table_metadata.json` files. - In FileIO Cache like opendal's CacheLayer, but the API is not decided yet. ## Tasks - ObjectCache - [ ] https://github.com/apache/iceberg-rust/pull/1222 - [ ] https://github.com/apache/iceberg-rust/pull/1225 - BytesCache - OpenDAL CacheLayer (TBD) ### Willingness to contribute I can contribute to this feature independently -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org