jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2083999610
Updates for monthly sync: 1. Working on dynamic overwrite which gets unblocked by partial deletes https://github.com/apache/iceberg-python/pull/569 2. For transforms functions, we could convert the arrow column to a Python list and feed that to the transform function to generate transformed pyarrow columns for grouping partitions using existing algorithm. But there is efficiency concerns since the transform function can only take Python data types and we have to convert between arrow, python and back to arrow. Also, the types in arrow and iceberg are quite different and sometimes we need to call some utility functions. For example, timestamp is converted to datetime in Python, and we have to call an existing utility function to convert it to micros(int) before feeding it into transform functions. Another option is to create an Arrow UDF for the partition transforms which might parallelize better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org