Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

via GitHub Wed, 08 Nov 2023 20:16:01 -0800


zeddit commented on issue #132:
URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803136799


   ### 3. how about partitioned tables
   it is a common case for people to use partition to manage tables, and in 
iceberg partition will lead to a great performance gain by skipping a lot of 
some data-files.
   we test the order behaviors with the following two tables.
   ```
   CREATE TABLE test_table3(
       date date
   )
   WITH (
       format = 'PARQUET',
       format_version = 2,
       location = 's3a://test/test_table3',
       partitioning = ARRAY['month(date)'],
       sorted_by = ARRAY['date']
   )
   ---
    CREATE TABLE test_table4 (
       date date,
       sym varchar
    )
    WITH (
       format = 'PARQUET',
       format_version = 2,
       location = 's3a://test/test_table4',
       partitioning = ARRAY['sym'],
       sorted_by = ARRAY['sym','date']
    )
   ```
   We also insert some values and get the results to observe their orders.
   It's a bad news that the order between partitions will never be under 
controlled by any means of controlling the writing method. e.g. even when we 
conduct a global sort, the month order in the final result is still a random 
one, which make time series analysis disappointed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

Reply via email to