pvary commented on code in PR #10926:
URL: https://github.com/apache/iceberg/pull/10926#discussion_r1750267503


##########
core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java:
##########
@@ -63,7 +63,11 @@ public class HadoopFileIO implements HadoopConfigurable, 
DelegateFileIO {
    * <p>{@link Configuration Hadoop configuration} must be set through {@link
    * HadoopFileIO#setConf(Configuration)}
    */
-  public HadoopFileIO() {}
+  public HadoopFileIO() {
+    // Create a default hadoopConf as it is required for the object to be 
valid.
+    // E.g. newInputFile would throw NPE with hadoopConf.get() otherwise.
+    this.hadoopConf = new SerializableConfiguration(new Configuration())::get;

Review Comment:
   @stevenzwu: 
   > if I understand correctly, your point is that FileIO probably shouldn't be 
part of the task state for BaseEntriesTable.ManifestReadTask, 
AllManifestsTable.ManifestListReadTask? [..] It seems like a large/challenging 
refactoring. Looking for other folks' take on this.
   
   Yes, this seems strange, and problematic to me. I was not able to find an 
easy solution yet. I was hoping, others with better knowledge might have some 
ideas.
   
   > Regardless, the issue remains that FileIOParser doesn't serialize 
HadoopFileIO faithfully. I don't know if REST catalog has any need to use it to 
JSON serialize FileIO in the future.
   
   My point here is that, since we use `Configuration(false)` in some cases, 
and the way how the current serialization works, we already `doesn't serialize 
HadoopFileIO faithfully`. So If we don't find a solution for getting rid of the 
FileIO, we might as well write our own "unfaithful" serialization which mimics 
the way how the current serialization works.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to