asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912738795
@Fokko @syun64 @syun64 another option I can think is use polars to do it simple example below with hashing and partitioning sorting in partition. Where all the partition is handle by rust layer in Polars and we write parquet based on arrow table returned. Not sure if we want to add it as dependency? We can do custom transforms like hours etc we have in iceberg as well easily. import pyarrow as pa import pyarrow.compute as pc import polars as pl t = pa.table({'strings':["A", "A", "B", "A"],'ints':[2, 1, 3, 4]}) df=pl.from_arrow(t) N = 2 tables=(df.with_columns([ (pl.col("strings").hash() % N).alias("partition_id") ]).partition_by("partition_id")) for tbl in tables: print(tbl.to_arrow().sort_by("ints")) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org