zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803817044
@Fokko great thanks for your help. I do some more experiments according to your advices. I wonder if I correctly understand the `global sort` right. I think it's a way to arrange the list of manifest that pyiceberg will follow when using engine like trino to write data. I first create a sorted table with ``` CREATE TABLE iceberg.sort_order_exp.csdate100 ( date date ) WITH ( format = 'PARQUET', format_version = 2, location = 's3a://test/csdate100', sorted_by = ARRAY['date'] ); ``` Then I try to insert rows into it one by one with ``` INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-01') ORDER BY 1; INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-04') ORDER BY 1; INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-03') ORDER BY 1; INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-07') ORDER BY 1; INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-02') ORDER BY 1; INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-05') ORDER BY 1; INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-09') ORDER BY 1; INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-06') ORDER BY 1; INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-08') ORDER BY 1; ```` which would trigger a global sort who place the manifest in list in a right order. However, the results show that the order of manifest won't be right and the dataframe read out by pyiceberg is the one how rows are inserted one by one. i.e. <img width="468" alt="截屏2023-11-09 21 14 35" src="https://github.com/apache/iceberg-python/assets/30164206/98c64493-2197-4169-879a-60a2b5f2a494"> or it's an issue related with Trino, other engine like spark has no such problems? great thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org