Fokko commented on code in PR #10926:
URL: https://github.com/apache/iceberg/pull/10926#discussion_r1734473138


##########
core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java:
##########
@@ -63,7 +63,11 @@ public class HadoopFileIO implements HadoopConfigurable, 
DelegateFileIO {
    * <p>{@link Configuration Hadoop configuration} must be set through {@link
    * HadoopFileIO#setConf(Configuration)}
    */
-  public HadoopFileIO() {}
+  public HadoopFileIO() {
+    // Create a default hadoopConf as it is required for the object to be 
valid.
+    // E.g. newInputFile would throw NPE with hadoopConf.get() otherwise.
+    this.hadoopConf = new SerializableConfiguration(new Configuration())::get;

Review Comment:
   Hey @stevenzwu sorry for not replying here earlier, my mailbox is a bit 
swamped. Thanks for tagging me here.
   
   I would suggest taking a look at how this has been solved at Parquet-Java. 
Here [another layer called 
`ParquetConfiguration`](https://github.com/apache/parquet-java/blob/master/parquet-common/src/main/java/org/apache/parquet/conf/ParquetConfiguration.java)
 has been added, which extends `Iterable<Map.Entry<String, String>>`. [Hadoops' 
Configuration](https://github.com/apache/hadoop/blob/e4ee3d560bddc27a495cc9a158278a9c18276dd0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L229)
 also extends `Iterable<Map.Entry<String, String>>`. At Parquet the main goal 
was to decouple from Hadoop, and use 
[PlainParquetConfiguration](https://github.com/apache/parquet-java/blob/master/parquet-common/src/main/java/org/apache/parquet/conf/PlainParquetConfiguration.java)
 as an alternative way to set the configuration properties. This would be a 
good route for Flink as well. This would also help to allow [running without 
   Hadoop](https://github.com/apache/iceberg/pull/7369), since Hadoop has been 
an optional dependency for Flink for quite some time. This way we can serialize 
Hadoops `Configuration` into a `Map<String, String>` and deserialize it into 
Parquets' PlainParquetConfiguration equivalent, which is probably much more 
lightweight.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to