luoyuxia commented on issue #6153:
URL: https://github.com/apache/iceberg/issues/6153#issuecomment-1308443668

   May the reason is that the key in source is `id`, but the primary key in 
sink `dt,id`. The primary keys for source and sink aren't equal.
   Is possible for the following case?
   ```
   +I['bob', 1, 20221109]
   -U['bob', 1,  20221109]
   +U['bob', 1,  20221110]
   ```
   And then in the upsert iceberg, `+I['bob', 1, 20221109]`, +U['bob', 1,  
20221110]  will  be considered as different records since they has different 
keys.
   
   
   You can try to update you sink to 
   ```
   CREATE TABLE  IF NOT EXISTS iceberg.test_db.test_cdc(
   `username` STRING, 
     `id` int, 
      start_time String,
     `dt` string,
      PRIMARY KEY(`id`)  NOT ENFORCED
   )partitioned by (dt) 
   WITH (
      'write.format.default'='parquet',
      'format-version' = '2' ,
      'write.upsert.enabled'='true',
      'location' = 's3a://xxxxxx/xxx/xxx'
   );
   ```
   and see whether the duplicated records are still in here or not.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to