tongwai-wong-appier commented on issue #13763: URL: https://github.com/apache/iceberg/issues/13763#issuecomment-4426850286
We opened #16282 for what appears to be the same symptom, but with a more specific trigger path on the Kafka Connect sink side. In our case, the issue happens during coordinator switchover after a Kafka timeout: the old coordinator can successfully commit file `X`, but before the control-topic offset is advanced, a new coordinator may re-consume stale `DATA_WRITTEN(file X)` and commit the same file again. The key reason this can still happen is: - Iceberg commit and control-topic offset advancement are not atomic - the append path does not check whether `file_path` is already registered in the table So from our investigation, #13756 may help reduce zombie coordinator behavior, but it does not fully prevent duplicate file re-registration in the append-only case. Linking here for visibility: #16282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
