tongwai-wong-appier commented on issue #13763:
URL: https://github.com/apache/iceberg/issues/13763#issuecomment-4441008721

   We observed the following in production.
   
   - We observed repeated broker/client disconnects in the same window, and one 
of the tasks eventually hit an unrecoverable `Connection timed out` error. So 
the connector was clearly operating under unstable Kafka connectivity at the 
time.
   - Apparently triggered by a network-related condition, we saw two 
overlapping commit cycles within about 10 seconds. Each cycle sent a 
`START_COMMIT`, and the workers emitted a `DATA_WRITTEN` event for what appears 
to be the same physical file path.
   
   ```
   13:10:00 Sending event of type: START_COMMIT
   13:10:00 Commit 667b23e7-431f-4159-ba0d-4baec79d9496 initiated
   13:10:01 Sending event of type: DATA_WRITTEN
   13:10:02 Commit 667b23e7-431f-4159-ba0d-4baec79d9496 not ready, received 
responses for 1 of 2 partitions, waiting for more
   
   13:10:08 Sending event of type: START_COMMIT
   13:10:08 Commit 5f8a12a1-deee-4710-b397-e0344965a61a initiated
   13:10:09 Sending event of type: DATA_WRITTEN
   13:10:09 Commit 5f8a12a1-deee-4710-b397-e0344965a61a not ready, received 
responses for 1 of 2 partitions, waiting for more
   
   13:10:39 Commit 667b23e7-431f-4159-ba0d-4baec79d9496 complete, committed to 
1 table(s), valid-through null
   13:10:47 Commit 5f8a12a1-deee-4710-b397-e0344965a61a complete, committed to 
1 table(s), valid-through null
   ```
   
   Separately, when we inspect the resulting manifests, the same `file_path` 
appears under two different snapshots.
   
   So this looks closer to the Case 3 scenario we discussed: a second 
`START_COMMIT` leading to a new `DATA_WRITTEN` for the same file path, rather 
than a replay of the same control-topic record.
   
   My question is whether current `main` is expected to dedup this case as well.
   - If yes, which layer is responsible for cross-commit file-path 
deduplication? 
   - If not, then this seems to be a gap beyond offset validation / stale-event 
filtering. 
   
   Or, more concretely, how should overlapping coordinators avoid producing 
duplicate `DATA_WRITTEN` events for the same physical file path?
   
   
[connect-stg-logs-1310-1345.log](https://github.com/user-attachments/files/27702939/connect-stg-logs-1310-1345.log)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to