stevenzwu commented on PR #10926:
URL: https://github.com/apache/iceberg/pull/10926#issuecomment-2379987754

   > I don't think that we should change how this works. A Hadoop Configuration 
MUST be provided externally.
   
   This makes sense. We already have some consensus that the current PR needs 
to be updated. We just don't have a consensus on how to fix the NPE problem 
where deserialized HadoopFile has null Hadoop Configuration object.
   
   >  FileIO serialization is not intended to send the entire Hadoop 
Configuration and should remain separate.
   
   This is not super clear to me. What if users loaded the `FileIO` with 
customized/overridden properties in Hadoop Configuration object. Loading a 
default Hadoop Configuration object on the receiving host won't contain those 
overrides.
   ```
   CatalogUtil.loadFileIO(impl, properties, conf)
   ```
   
   @pvary had a related concern on the size of Hadoop Configuration entries if 
we serialize the all properties (most of them are default properties loaded 
from host). What if we just serialize the overridden properties (assuming the 
sender and receiver side have the same host level Hadoop conf xml)?
   
   Anyway, the current NPE problem with deserialized `HadoopFileIO` need to be 
fixed somehow.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to