stevenzwu commented on PR #10926: URL: https://github.com/apache/iceberg/pull/10926#issuecomment-2379987754
> I don't think that we should change how this works. A Hadoop Configuration MUST be provided externally. This makes sense. We already have some consensus that the current PR needs to be updated. We just don't have a consensus on how to fix the NPE problem where deserialized HadoopFile has null Hadoop Configuration object. > FileIO serialization is not intended to send the entire Hadoop Configuration and should remain separate. This is not super clear to me. What if users loaded the `FileIO` with customized/overridden properties in Hadoop Configuration object. Loading a default Hadoop Configuration object on the receiving host won't contain those overrides. ``` CatalogUtil.loadFileIO(impl, properties, conf) ``` @pvary had a related concern on the size of Hadoop Configuration entries if we serialize the all properties (most of them are default properties loaded from host). What if we just serialize the overridden properties (assuming the sender and receiver side have the same host level Hadoop conf xml)? Anyway, the current NPE problem with deserialized `HadoopFileIO` need to be fixed somehow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org