HonahX commented on PR #790:
URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2157590281

   Hi @MehulBatra. Thanks for taking this! It looks like a great start. 
   
   > I believe we need to make a change in this write_file method to support 
ORC writes, as the link goes
   
   Yes, I think this is the right place to add the ORC write logic. As you have 
noticed, the datafile format is controlled by the table property 
`write.default.format`. Currently we do not support this property in pyiceberg 
since we assume the format is parquet. 
   
   We can add the property in 
https://github.com/apache/iceberg-python/blob/c4feda5db83cfb230caefa124d7a8f2600d920f7/pyiceberg/table/__init__.py#L206-L211
   and doc it 
here:https://github.com/apache/iceberg-python/blob/94e8a9835995e3b61f07f0dfb48d8a22a1e1d1b0/mkdocs/docs/configuration.md?plain=1#L53-L63
   
   In the `write_file`, we check the `write.default.format` property and write 
to the correct format. For statistics, we may need a 
`data_file_statistics_from_orc` similar to 
https://github.com/apache/iceberg-python/blob/e61ef5770b4d73e683e2c78bebdd6c2165102a6b/pyiceberg/io/pyarrow.py#L1674-L1698
   (we can make statistics collection as a follow-up feature since most 
statistics fields are optional)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to