mothukur opened a new issue, #11776: URL: https://github.com/apache/iceberg/issues/11776
### Apache Iceberg version 1.7.1 (latest release) ### Query engine None ### Please describe the bug 🐞 The current GlueCatalog implementation does not allow for the reuse of the FileIO object, leading to inefficient usage of manifest cache implemented in `ManifestFiles` class. **Problematic Code** The `GlueTableOperations` class creates a new `FileIO` object for each instance: https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java#L113 ``` public FileIO io() { if (fileIO == null) { fileIO = initializeFileIO(this.tableCatalogProperties, this.hadoopConf); } return fileIO; } ``` This prevents the `ManifestFiles` class from using the cache : https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/ManifestFiles.java#L75 ``` static ContentCache contentCache(FileIO io) { return CONTENT_CACHES.get( io, fileIO -> new ContentCache( cacheDurationMs(fileIO), cacheTotalBytes(fileIO), cacheMaxContentLength(fileIO))); } ``` **Proposed Solution** Add a constructor or method to the `GlueCatalog` class that accepts a `FileIO` object or a function that builds a `FileIO` object, similar to `JdbcCatalog`: https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java#L99 ``` public JdbcCatalog( Function<Map<String, String>, FileIO> ioBuilder, Function<Map<String, String>, JdbcClientPool> clientPoolBuilder, boolean initializeCatalogTables) { this.ioBuilder = ioBuilder; this.clientPoolBuilder = clientPoolBuilder; this.initializeCatalogTables = initializeCatalogTables; } ``` ### Willingness to contribute - [X] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org