luoyuxia commented on issue #6153: URL: https://github.com/apache/iceberg/issues/6153#issuecomment-1308443668
May the reason is that the key in source is `id`, but the primary key in sink `dt,id`. The primary keys for source and sink aren't equal. Is possible for the following case? ``` +I['bob', 1, 20221109] -U['bob', 1, 20221109] +U['bob', 1, 20221110] ``` And then in the upsert iceberg, `+I['bob', 1, 20221109]`, +U['bob', 1, 20221110] will be considered as different records since they has different keys. You can try to update you sink to ``` CREATE TABLE IF NOT EXISTS iceberg.test_db.test_cdc( `username` STRING, `id` int, start_time String, `dt` string, PRIMARY KEY(`id`) NOT ENFORCED )partitioned by (dt) WITH ( 'write.format.default'='parquet', 'format-version' = '2' , 'write.upsert.enabled'='true', 'location' = 's3a://xxxxxx/xxx/xxx' ); ``` and see whether the duplicated records are still in here or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org