zhangpengbigdata opened a new issue, #6153:
URL: https://github.com/apache/iceberg/issues/6153

   ### Query engine
   
   Iceberg 1.0.0 Flink1.13 
   
   ### Question
   
   Hi all,
   I found duplicate records when i was repeatedly exporting records from CDC 
Stream into iceberg partitioned table.  Could you please help me ?
   
   Iceberg 1.0.0, Flink1.13
   SQL like this:
   ```
   CREATE CATALOG iceberg WITH (
     'type'='iceberg',
     'catalog-type'='hive',
     'uri'='thrift://xxxx:9083',
     'clients'='5',
     'property-version'='1',
     'warehouse'='s3a://xxxx/xxxx/'
   );
   
     CREATE TABLE test_cdc ( 
     `username` varchar(100), 
     `id` int, 
      start_time String,
     PRIMARY KEY (`id`)  NOT ENFORCED
     ) WITH (
       'connector' = 'mysql-cdc',
       'hostname' = 'xxxxxx',
       'port' = '3306',
       'username' = 'xxx',
       'password' = 'xxx',
       'database-name' = 'xxx',
       'table-name' = 'xxx'
     );
   CREATE TABLE  IF NOT EXISTS iceberg.test_db.test_cdc(
   `username` STRING, 
     `id` int, 
      start_time String,
     `dt` string,
      PRIMARY KEY(`dt`, `id`)  NOT ENFORCED
   )partitioned by (dt) 
   WITH (
      'write.format.default'='parquet',
      'format-version' = '2' ,
      'write.upsert.enabled'='true',
      'location' = 's3a://xxxxxx/xxx/xxx'
   );
   INSERT INTO iceberg.test_db.test_cdc
   SELECT 
     `username`,
     `id`, 
   start_time,
     FROM_UNIXTIME(cast(start_time as bigint)  / 1000, 'yyyyMMdd') as dt 
   FROM test_cdc ;
   ```
   Executed flink sql, the result is:
   select count(distinct id), count(id),dt from iceberg.test_db.test_cdc group 
by dt order by dt
   <img width="579" alt="clipboard" 
src="https://user-images.githubusercontent.com/116717900/200744200-61199ef0-81c7-4e56-bca9-3e0b1e5aeebe.png";>
   Rerun the flink sql,  the result is:
   select count(distinct id), count(id),dt from iceberg.test_db.test_cdc group 
by dt order by dt
   <img width="590" alt="clipboard2" 
src="https://user-images.githubusercontent.com/116717900/200744267-7973fcd1-73e9-4796-85fa-2d80092e3214.png";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to