Re: [PR] Fix IndexOutOfBounds exception in FileFormat#fromFileName [iceberg]

via GitHub Tue, 18 Feb 2025 08:20:59 -0800


pvary commented on PR #12301:
URL: https://github.com/apache/iceberg/pull/12301#issuecomment-2666203768


   I understand that this PR is just a fix for an existing method, but I have 
concerns about the original intention of the method. We are relying on the 
filename to deduce the actual file format. This seems brittle to me. For 
example many of our test are generating parquet files without extensions.
   
   I have faced a similar issue here: 
https://github.com/apache/iceberg/pull/11216#discussion_r1939040414
   
   The Iceberg specification have a `file_format` field for data files 
specifying the actual file format. Shouldn't we rely on these fields instead of 
trying to find out the format from the location of the file? If we want to 
allow metadata files to use different file formats, we might want to add a 
file_format field to the metadata descriptors too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix IndexOutOfBounds exception in FileFormat#fromFileName [iceberg]

Reply via email to