stevenzwu commented on code in PR #10926:
URL: https://github.com/apache/iceberg/pull/10926#discussion_r1717669065


##########
core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java:
##########
@@ -63,7 +63,11 @@ public class HadoopFileIO implements HadoopConfigurable, 
DelegateFileIO {
    * <p>{@link Configuration Hadoop configuration} must be set through {@link
    * HadoopFileIO#setConf(Configuration)}
    */
-  public HadoopFileIO() {}
+  public HadoopFileIO() {
+    // Create a default hadoopConf as it is required for the object to be 
valid.
+    // E.g. newInputFile would throw NPE with hadoopConf.get() otherwise.
+    this.hadoopConf = new SerializableConfiguration(new Configuration())::get;

Review Comment:
    if the `FileIO` is `Configurable`, create a default `Configuration` object  
and call `setConf()`. is that what you mean? that will add Hadoop class 
dependency to the `FileIOParser` though.
   
   I have also thought about implementing the serialization of Hadoop 
`Configuration` in the `FileIOParser`. but it will add some complexity to the 
`FileIOParser` and we likely won't achieve 100% fidelity with the original 
object. E.g., we may only serialize the key-value string pairs from the 
configuration as a JSON object.  I will also have the problem of adding Hadoop 
class dependency to the `FileIOParser`.
   
   I also thought about use this method from `SerializationUtil`? But it is 
Java serialization (not JSON serialization).
   ```
     /**
      * Serialize an object to bytes. If the object implements {@link 
HadoopConfigurable}, its Hadoop
      * configuration will be serialized into a {@link 
SerializableConfiguration}.
      *
      * @param obj object to serialize
      * @return serialized bytes
      */
     public static byte[] serializeToBytes(Object obj) {
       return serializeToBytes(obj, conf -> new 
SerializableConfiguration(conf)::get);
     }
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to