stevenzwu commented on code in PR #10926:
URL: https://github.com/apache/iceberg/pull/10926#discussion_r1718535334


##########
core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java:
##########
@@ -63,7 +63,11 @@ public class HadoopFileIO implements HadoopConfigurable, 
DelegateFileIO {
    * <p>{@link Configuration Hadoop configuration} must be set through {@link
    * HadoopFileIO#setConf(Configuration)}
    */
-  public HadoopFileIO() {}
+  public HadoopFileIO() {
+    // Create a default hadoopConf as it is required for the object to be 
valid.
+    // E.g. newInputFile would throw NPE with hadoopConf.get() otherwise.
+    this.hadoopConf = new SerializableConfiguration(new Configuration())::get;

Review Comment:
   `conf()` method is fine. 
   
   `FileIOParser` doesn't faithfully serialize and deserialize the 
`HadoopFileIO` (Hadoop configuration is not carried over). The deserialized 
`HadoopFileIO` may miss important configs, which can be a problem.
   
   from `CatalogUtil`, there are 3 components defining a `FileIO`. 
`FileIOParser` is missing the `conf`. 
   ```
   FileIO loadFileIO(String impl, Map<String, String> properties, Object 
hadoopConf)
   ```
   
   I am wondering if we should change `FileIOParser` to serialize and 
deserialize Hadoop `Configuration` when the `FileIO` is `HadoopConfigurable`. 
We can probably only serialize the key-value string pairs from the 
`Configuration` as a JSON object (kind of a read only copy).
   
   
   BTW, `ResolvingFileIO` and `HadoopConfigurable` also have Hadoop class 
dependency. There was a discussion of potentially moving `HadoopCatalog` to a 
separate `iceberg-hadoop` module. I guess we can't move `ResolvingFileIO` and 
`HadoopConfigurable` then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to