zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803136799
### 3. how about partitioned tables it is a common case for people to use partition to manage tables, and in iceberg partition will lead to a great performance gain by skipping a lot of some data-files. we test the order behaviors with the following two tables. ``` CREATE TABLE test_table3( date date ) WITH ( format = 'PARQUET', format_version = 2, location = 's3a://test/test_table3', partitioning = ARRAY['month(date)'], sorted_by = ARRAY['date'] ) --- CREATE TABLE test_table4 ( date date, sym varchar ) WITH ( format = 'PARQUET', format_version = 2, location = 's3a://test/test_table4', partitioning = ARRAY['sym'], sorted_by = ARRAY['sym','date'] ) ``` We also insert some values and get the results to observe their orders. It's a bad news that the order between partitions will never be under controlled by any means of controlling the writing method. e.g. even when we conduct a global sort, the month order in the final result is still a random one, which make time series analysis disappointed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org