Re: [I] Support partitioned writes [iceberg-python]

via GitHub Fri, 26 Jan 2024 12:17:15 -0800


asheeshgarg commented on issue #208:
URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912636932


   @jqin61 I have also seen this behavior pyarrow.dataset.write_dataset(), its 
behavior removes the partition columns in the written-out parquet files. 
   @syun64 above approach look reasonable to me.
   
   It would have been ideal if the partition write we can be done directly 
using arrow dataset API and use meta data based hidden partitioning using 
Pyiceberg API. But we need to do good amount of lift in order to that. Haven't 
seen support for bucket partitioning. 
   
   I think we can add write directly using the Pyarrow API as suggested above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Support partitioned writes [iceberg-python]

Reply via email to