Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-08-26 Thread via GitHub
stevenzwu closed issue #10431: Flink sink writes duplicate data in upsert mode URL: https://github.com/apache/iceberg/issues/10431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-17 Thread via GitHub
pvary commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2173132205 @zhongqishang: Seems like an issue with checkpoint retries. Is there any chance to retry the issue with newer version of Flink? The currently supported versions are 1.17, 1.18, 1.19

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-11 Thread via GitHub
pvary commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2161437645 Seems like an issue with checkpoint retry. Will be out of office for a bit, but this needs to be investigated. -- This is an automated message from the Apache Git Service. To res

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-11 Thread via GitHub
zhongqishang commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2160846434 @pvary I encountered the same problem on another table, this time it was caused by a checkpoint PRC timeout. JM log ``` 2024-06-07 15:50:10.472 [Checkpoint Timer]

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-10 Thread via GitHub
zhongqishang commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2157653011 @pvary Yes, the `01930` file is a pos delete file, but the file path is only contain the data file `01928`. -- This is an automated message from the Apache Git Service. To

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-10 Thread via GitHub
pvary commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2157551451 Here is how the different deletes work: - EQ-DELETE - removes all occurrences of the record with the given id BEFORE the snapshot - POS-DELETE - removes a given row from the giv

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-04 Thread via GitHub
zhongqishang commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2148804889 > @zhongqishang: Do you see anything more in the logs? Exceptions/retries, or something like this? I have not found any Exceptions/retries around the wrong snapshot time

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-04 Thread via GitHub
pvary commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2147373677 @zhongqishang: Do you see anything more in the logs? Exceptions/retries, or something like this? Also, I don't fully understand your statement here: > I think it is because

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-03 Thread via GitHub
zhongqishang commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2146411518 > @zhongqishang: How is your sink/table created? What are the exact records you are sending to the sink? Your issue seems very similar to: #10076 @pvary Thanks for your

Re: [I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-03 Thread via GitHub
pvary commented on issue #10431: URL: https://github.com/apache/iceberg/issues/10431#issuecomment-2146195992 @zhongqishang: How is your sink/table created? What are the exact records you are sending to the sink? Your issue seems very similar to: https://github.com/apache/iceberg/issues/1007

[I] Flink sink writes duplicate data in upsert mode [iceberg]

2024-06-03 Thread via GitHub
zhongqishang opened a new issue, #10431: URL: https://github.com/apache/iceberg/issues/10431 ### Apache Iceberg version 1.2.1 ### Query engine Flink ### Please describe the bug 🐞 I have a flink upsert job with a checkpoint interval of 5 minutes and an exter