Re: [I] Support partitioned writes [iceberg-python]

via GitHub Mon, 29 Apr 2024 17:55:04 -0700


jqin61 commented on issue #208:
URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2083999610


   Updates for monthly sync:
   1. Working on dynamic overwrite which gets unblocked by partial deletes 
https://github.com/apache/iceberg-python/pull/569
   2. For transforms functions, we could convert the arrow column to a Python 
list and feed that to the transform function to generate transformed pyarrow 
columns for grouping partitions using existing algorithm. But there is 
efficiency concerns since the transform function can only take Python data 
types and we have to convert between arrow, python and back to arrow.
    Also, the types in arrow and iceberg are quite different and sometimes we 
need to call some utility functions. For example, timestamp is converted to 
datetime in Python, and we have to call an existing utility function to convert 
it to micros(int) before feeding it into transform functions. Another option is 
to create an Arrow UDF for the partition transforms which might parallelize 
better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support partitioned writes [iceberg-python]

Reply via email to