Guosmilesmile commented on PR #12639: URL: https://github.com/apache/iceberg/pull/12639#issuecomment-2750963112
> It might be only me, but `SnapshotExpirationResetStrategy.LATEST`, `SnapshotExpirationResetStrategy.EARLIEST` is a very serious data corruption issue which should require manual intervention. The only valid use-case would be a CDC table, but for those we don't have a steaming read yet. > > @stevenzwu, @mxm: What are your thoughts? @pvary Thank you very much for your response. Here is my somewhat immature little thought. In our daily operations, if we encounter such a scenario and manually intervene, the only way to recover is by modifying the source configuration to set the `starting-strategy`. However, whether we choose to recover from `INCREMENTAL_FROM_EARLIEST_SNAPSHOT` or `INCREMENTAL_FROM_LATEST_SNAPSHOT`, it will inevitably lead to data loss. Recovering from `TABLE_SCAN_THEN_INCREMENTAL` would result in data duplication, which could also cause a significant traffic impact downstream. None of these options are ideal, as each can introduce certain data issues. Moreover, manual intervention may not always be timely, potentially leading to even more data loss. This is why we came up with the idea of needing such an automated recovery configuration, while also retaining the default option to support the default behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org