Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1910290987
Hey @jqin61 Thanks for the elaborate post, and sorry for my slow reply. I did want to take the time to write a good answer. Probably the following statement needs another map step: ```python partitions: list[dict] = pyarrow.compute.unique(arrow_table) ``` The above is true for an identity partition, but often we take truncate the month, day or hour from a field, and use that as a partition. Another example is the bucketing partition where we hash the field, and determine in which bucket it will fall. With regard of utilizing the Arrow primitives that are already there. I think that's a great idea, we just have to make sure that they are flexible enough for Iceberg. There are a couple of questions that pop into my mind: - Can we support all Icebergs partition strategies, such as bucketing, truncating etc. - Are we able to extract the metrics similar that we do for non-partitioned writes. @asheeshgarg Thanks for giving it a try. Looking at the schema, there is a discrapency. The test-data that you generate has `value_1` as an int64, and the table expects a string. I think the error is correct here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org