nastra commented on code in PR #12868: URL: https://github.com/apache/iceberg/pull/12868#discussion_r2079896311
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SerializableTableWithSize.java: ########## @@ -33,8 +33,9 @@ * * <p>This class also implements AutoCloseable to avoid leaking resources upon broadcasting. * Broadcast variables are destroyed and cleaned up on the driver and executors once they are - * garbage collected on the driver. The implementation ensures only resources used by copies of the - * main table are released. + * garbage collected on the driver. The implementation should avoid closing deserialized copies of + * shared resources like FileIO, as they may use a shared connection pool. Shutting down the pool Review Comment: I believe the issue is specific to `S3FileIO` and I thought that each `S3FileIO` instance uses its own and dedicated `ApacheHttpClient` (because we set it up as described in https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/http-configuration-apache.html), which would create a new `S3Client` instance and therefore a new `ApacheHttpClient` after a Table got broadcasted or is that not the case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org