Guosmilesmile commented on PR #12639:
URL: https://github.com/apache/iceberg/pull/12639#issuecomment-2750963112

   > It might be only me, but `SnapshotExpirationResetStrategy.LATEST`, 
`SnapshotExpirationResetStrategy.EARLIEST` is a very serious data corruption 
issue which should require manual intervention. The only valid use-case would 
be a CDC table, but for those we don't have a steaming read yet.
   > 
   > @stevenzwu, @mxm: What are your thoughts?
   
   @pvary Thank you very much for your response. Here is my somewhat immature 
little thought. 
   
   In our daily operations, if we encounter such a scenario and manually 
intervene, the only way to recover is by modifying the source configuration to 
set the `starting-strategy`. However, whether we choose to recover from 
`INCREMENTAL_FROM_EARLIEST_SNAPSHOT` or `INCREMENTAL_FROM_LATEST_SNAPSHOT`, it 
will inevitably lead to data loss. Recovering from 
`TABLE_SCAN_THEN_INCREMENTAL` would result in data duplication, which could 
also cause a significant traffic impact downstream. None of these options are 
ideal, as each can introduce certain data issues.
   
   Moreover, manual intervention may not always be timely, potentially leading 
to even more data loss. This is why we came up with the idea of needing such an 
automated recovery configuration, while also retaining the default option to 
support the default behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to