wombatu-kun opened a new pull request, #16576: URL: https://github.com/apache/iceberg/pull/16576
## Summary Users have reported confusion about the Kafka Connect sink's control topic growing without bound (#15844, https://github.com/apache/iceberg/issues/15844). The docs explain how to *create* the control topic but never describe what it is for, how its events are used, or that it should have a finite retention — so on brokers with a large or unlimited default `retention.ms`, the topic accumulates coordination events indefinitely. This documents the behavior and the recommended configuration. ## What changed Expanded the "Control topic" section of `docs/docs/kafka-connect.md`: - Added a paragraph on the control topic's purpose and the per-commit event flow (`StartCommit` / `DataWritten` / `DataComplete` / `CommitToTable` / `CommitComplete`), noting that `DataWritten` carries data/delete file metadata rather than rows, and that the durable commit position lives in the table snapshot — so control-topic events are transient. - Added `--config retention.ms=3600000` to the topic-creation example. - Added a "Control topic retention" subsection: why an auto-created topic grows under broker defaults, how to size `retention.ms` relative to the commit interval/timeout, how to set it on an existing topic, multi-connector sizing, and using `cleanup.policy=delete` rather than compaction. Closes #15844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
