pvary commented on PR #12301: URL: https://github.com/apache/iceberg/pull/12301#issuecomment-2666203768
I understand that this PR is just a fix for an existing method, but I have concerns about the original intention of the method. We are relying on the filename to deduce the actual file format. This seems brittle to me. For example many of our test are generating parquet files without extensions. I have faced a similar issue here: https://github.com/apache/iceberg/pull/11216#discussion_r1939040414 The Iceberg specification have a `file_format` field for data files specifying the actual file format. Shouldn't we rely on these fields instead of trying to find out the format from the location of the file? If we want to allow metadata files to use different file formats, we might want to add a file_format field to the metadata descriptors too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org