HonahX commented on PR #790: URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2157590281
Hi @MehulBatra. Thanks for taking this! It looks like a great start. > I believe we need to make a change in this write_file method to support ORC writes, as the link goes Yes, I think this is the right place to add the ORC write logic. As you have noticed, the datafile format is controlled by the table property `write.default.format`. Currently we do not support this property in pyiceberg since we assume the format is parquet. We can add the property in https://github.com/apache/iceberg-python/blob/c4feda5db83cfb230caefa124d7a8f2600d920f7/pyiceberg/table/__init__.py#L206-L211 and doc it here:https://github.com/apache/iceberg-python/blob/94e8a9835995e3b61f07f0dfb48d8a22a1e1d1b0/mkdocs/docs/configuration.md?plain=1#L53-L63 In the `write_file`, we check the `write.default.format` property and write to the correct format. For statistics, we may need a `data_file_statistics_from_orc` similar to https://github.com/apache/iceberg-python/blob/e61ef5770b4d73e683e2c78bebdd6c2165102a6b/pyiceberg/io/pyarrow.py#L1674-L1698 (we can make statistics collection as a follow-up feature since most statistics fields are optional) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
