stevenzwu commented on code in PR #10926: URL: https://github.com/apache/iceberg/pull/10926#discussion_r1717669065
########## core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java: ########## @@ -63,7 +63,11 @@ public class HadoopFileIO implements HadoopConfigurable, DelegateFileIO { * <p>{@link Configuration Hadoop configuration} must be set through {@link * HadoopFileIO#setConf(Configuration)} */ - public HadoopFileIO() {} + public HadoopFileIO() { + // Create a default hadoopConf as it is required for the object to be valid. + // E.g. newInputFile would throw NPE with hadoopConf.get() otherwise. + this.hadoopConf = new SerializableConfiguration(new Configuration())::get; Review Comment: if the `FileIO` is `Configurable`, create a default `Configuration` object and call `setConf()`. is that what you mean? that will add Hadoop class dependency to the `FileIOParser` though. I have also thought about implementing the serialization of Hadoop `Configuration` in the `FileIOParser`. but it will add some complexity to the `FileIOParser` and we likely won't achieve 100% fidelity with the original object. E.g., we may only serialize the key-value string pairs from the configuration as a JSON object. I also thought about use this method from `SerializationUtil`? But it is Java serialization (not JSON serialization). ``` /** * Serialize an object to bytes. If the object implements {@link HadoopConfigurable}, its Hadoop * configuration will be serialized into a {@link SerializableConfiguration}. * * @param obj object to serialize * @return serialized bytes */ public static byte[] serializeToBytes(Object obj) { return serializeToBytes(obj, conf -> new SerializableConfiguration(conf)::get); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org