zhangpengbigdata opened a new issue, #6153: URL: https://github.com/apache/iceberg/issues/6153
### Query engine Iceberg 1.0.0 Flink1.13 ### Question Hi all, I found duplicate records when i was repeatedly exporting records from CDC Stream into iceberg partitioned table. Could you please help me ? Iceberg 1.0.0, Flink1.13 SQL like this: ``` CREATE CATALOG iceberg WITH ( 'type'='iceberg', 'catalog-type'='hive', 'uri'='thrift://xxxx:9083', 'clients'='5', 'property-version'='1', 'warehouse'='s3a://xxxx/xxxx/' ); CREATE TABLE test_cdc ( `username` varchar(100), `id` int, start_time String, PRIMARY KEY (`id`) NOT ENFORCED ) WITH ( 'connector' = 'mysql-cdc', 'hostname' = 'xxxxxx', 'port' = '3306', 'username' = 'xxx', 'password' = 'xxx', 'database-name' = 'xxx', 'table-name' = 'xxx' ); CREATE TABLE IF NOT EXISTS iceberg.test_db.test_cdc( `username` STRING, `id` int, start_time String, `dt` string, PRIMARY KEY(`dt`, `id`) NOT ENFORCED )partitioned by (dt) WITH ( 'write.format.default'='parquet', 'format-version' = '2' , 'write.upsert.enabled'='true', 'location' = 's3a://xxxxxx/xxx/xxx' ); INSERT INTO iceberg.test_db.test_cdc SELECT `username`, `id`, start_time, FROM_UNIXTIME(cast(start_time as bigint) / 1000, 'yyyyMMdd') as dt FROM test_cdc ; ``` Executed flink sql, the result is: select count(distinct id), count(id),dt from iceberg.test_db.test_cdc group by dt order by dt <img width="579" alt="clipboard" src="https://user-images.githubusercontent.com/116717900/200744200-61199ef0-81c7-4e56-bca9-3e0b1e5aeebe.png"> Rerun the flink sql, the result is: select count(distinct id), count(id),dt from iceberg.test_db.test_cdc group by dt order by dt <img width="590" alt="clipboard2" src="https://user-images.githubusercontent.com/116717900/200744267-7973fcd1-73e9-4796-85fa-2d80092e3214.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org