zeddit commented on issue #132:
URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803817044

   @Fokko great thanks for your help.
   I do some more experiments according to your advices.
   
   I wonder if I correctly understand the `global sort` right. I think it's a 
way to arrange the list of manifest that pyiceberg will follow when using 
engine like trino to write data.
   
   I first create a sorted table with
   ```
   CREATE TABLE iceberg.sort_order_exp.csdate100 (
       date date
   )
   WITH (
       format = 'PARQUET',
       format_version = 2,
       location = 's3a://test/csdate100',
       sorted_by = ARRAY['date']
   );
   ```
   Then I try to insert rows into it one by one with 
   ```
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-01') 
ORDER BY 1;
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-04') 
ORDER BY 1;
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-03') 
ORDER BY 1;
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-07') 
ORDER BY 1;
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-02') 
ORDER BY 1;
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-05') 
ORDER BY 1;
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-09') 
ORDER BY 1;
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-06') 
ORDER BY 1;
   INSERT INTO iceberg.sort_order_exp.csdate100 VALUES (date '2021-01-08') 
ORDER BY 1;
   ````
   which would trigger a global sort who place the manifest in list in a right 
order.
   
   However, the results show that the order of manifest won't be right and the 
dataframe read out by pyiceberg is the one how rows are inserted one by one. 
i.e.
   <img width="468" alt="截屏2023-11-09 21 14 35" 
src="https://github.com/apache/iceberg-python/assets/30164206/98c64493-2197-4169-879a-60a2b5f2a494";>
   
   or it's an issue related with Trino, other engine like spark has no such 
problems?
   
   great thanks.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to