a8356555 opened a new issue, #9022: URL: https://github.com/apache/iceberg/issues/9022
### Apache Iceberg version 1.4.0 ### Query engine Flink ### Please describe the bug 🐞 FlinkSQL Read From Iceberg using ORDER BY clause caused some data loss. Flink version: 1.6.2 1. I have written a table to iceberg using flink ```sql CREATE CATALOG `mycatalog` WITH ( 'type' = 'iceberg', 'catalog-impl' = 'org.apache.iceberg.aws.glue.GlueCatalog', 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO', 'warehouse' = 's3://mywarehouse' ); CREATE TABLE IF NOT EXISTS glue01.mydb.mytable ( `id` INT, `price` INT, `op` STRING, `date` TIMESTAMP, `t` TIMESTAMP ) PARTITIONED BY (`date`,`op`); INSERT INTO glue01.mydb.mytable VALUES (1, 100, 'c', TIMESTAMP '2023-01-01', TIMESTAMP '2023-01-01 00:00:00'), (2, 50, 'c', TIMESTAMP '2023-01-01', TIMESTAMP '2023-01-01 00:00:01'), (3, 25, 'c', TIMESTAMP '2023-01-01', TIMESTAMP '2023-01-01 00:00:02'); INSERT INTO glue01.mydb.mytable VALUES (1, 123, 'u', TIMESTAMP '2023-01-02', TIMESTAMP '2023-01-02 00:00:00'), (2, 321, 'u', TIMESTAMP '2023-01-02', TIMESTAMP '2023-01-02 00:00:10'); INSERT INTO glue01.mydb.mytable VALUES (1, 100, 'r', TIMESTAMP '2023-01-03', TIMESTAMP '2023-01-03 00:00:00'), (2, 50, 'r', TIMESTAMP '2023-01-03', TIMESTAMP '2023-01-03 00:00:00'), (3, 25, 'r', TIMESTAMP '2023-01-03', TIMESTAMP '2023-01-03 00:00:00'); INSERT INTO glue01.mydb.mytable VALUES (3, 25, 'r', TIMESTAMP '2023-01-04', TIMESTAMP '2023-01-04 00:00:00'), (3, 25, 'd', TIMESTAMP '2023-01-04', TIMESTAMP '2023-01-04 00:10:00'); INSERT INTO glue01.mydb.mytable VALUES (4, 100, 'c', TIMESTAMP '2023-01-05', TIMESTAMP '2023-01-05 00:00:00'), (4, 123, 'u', TIMESTAMP '2023-01-05', TIMESTAMP '2023-01-05 00:10:00'); ``` 2. Read Data. ```sql CREATE TEMPORARY TABLE `mytable` ( `id` INT, `price` INT, `op` STRING, `t` TIMESTAMP(3), WATERMARK FOR `t` AS `t` ) WITH ( 'connector' = 'iceberg', 'warehouse' = 's3://mywarehouse', 'catalog-impl' = 'org.apache.iceberg.aws.glue.GlueCatalog', 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO', 'catalog-name' = 'mycatalog', 'catalog-database' = 'mydb', 'catalog-table' = 'mytable' ) SELECT `id`, `price`, `op`, `t` from `mytable` /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/ ``` the results are not ordered ``` # +----+-------------+-------------+--------------------------------+-------------------------+ # | op | id | price | op | t | # +----+-------------+-------------+--------------------------------+-------------------------+ # | +I | 3 | 25 | r | 2023-01-04 00:00:00.000 | # | +I | 3 | 25 | d | 2023-01-04 00:10:00.000 | # | +I | 4 | 123 | u | 2023-01-05 00:10:00.000 | # | +I | 4 | 100 | c | 2023-01-05 00:00:00.000 | # | +I | 1 | 123 | u | 2023-01-02 00:00:00.000 | # | +I | 2 | 321 | u | 2023-01-02 00:00:10.000 | # | +I | 1 | 100 | r | 2023-01-03 00:00:00.000 | # | +I | 2 | 50 | r | 2023-01-03 00:00:00.000 | # | +I | 3 | 25 | r | 2023-01-03 00:00:00.000 | # | +I | 1 | 100 | c | 2023-01-01 00:00:00.000 | # | +I | 2 | 50 | c | 2023-01-01 00:00:01.000 | # | +I | 3 | 25 | c | 2023-01-01 00:00:02.000 | ``` I want the results ordered by `t` ```sql select `id`, `price`, `op`, `t` from `_table_b2f233fadf3d47478fac1f13620a08ca` /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/ order by `t` ``` but some data lost ``` # +----+-------------+-------------+--------------------------------+-------------------------+ # | op | id | price | op | t | # +----+-------------+-------------+--------------------------------+-------------------------+ # | +I | 3 | 25 | r | 2023-01-04 00:00:00.000 | # | +I | 3 | 25 | d | 2023-01-04 00:10:00.000 | # | +I | 4 | 123 | u | 2023-01-05 00:10:00.000 | ``` How could I retrieve data in order without a data loss? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org