wubiaoi opened a new pull request, #48820: URL: https://github.com/apache/doris/pull/48820
### What problem does this PR solve? 如果使用CCR配置库同步任务时,目标库下有同名的表,会导致误删除表,以及master与follower 表meta不一致。 #### CCR任务对于目标库下已经有相同表的处理流程 在FE端判断如果Restore的表已经存在,会校验新表和原表的scheme等信息是否一致,如果不一致会抛出异常(Table {} already exists but with different schema, "+ "local table: {}, remote table: {}),本次Restore任务失败;这时ccr-syncer服务收到该异常会catch处理,会对表进行alias重命名(__ccr_tablename_timestamp),重新发起Restore请求到FE,如果FE这时Restore成功,syncer服务会执行replace table(swap=false)来替换表,以完成同步。 #### 当前Fe处理逻辑 有一个for循环会对每个需要恢复的表进行判断,如果判断已经存在的表和将要同步的表scheme不同,会直接返回失败并cancel Restore任务;当有多个表重复时,一次Restore只返回一个表异常,这会导致Syncer服务不断的发起Restore操作,直到把所有的表加上alias。 #### Fe处理逻辑中的问题 因为是恢复alias后的表名,所以走表不存在的处理逻辑,这个时候会使用backup的表scheme来构造table对象,最后将表名更新为alias的名称,问题的关键是添加到restoredTable的逻辑和判断表scheme是否一致是在一个循环中,第一次按正常别名处理后,会在restoredTables中添加alias的表,但循环到第二个表如果表scheme不一致会直接return返回异常,这时不会将第一次的表名set为alias名,相当于直接把源库的表名加到了restoredTable中,这时restore任务失败后,会在cancel善后逻辑中将创建的alias表在restoreTable删除掉,但这个时候其实不是alias的表名,是正确的表名,表就被这么删除掉了!!! 经过不断Restore操作,Syncer服务会把所有表都alias,这时restore任务就可以成功了, 在Syncer中对每个表执行replace table时在master中源表其实是不存在的,会出现异常,永远无法恢复。 #### 为什么FE master和follower表Meta不一致? master在处理restore job时,只有download、commit、finished、cancel状态将会将restore Job对象存到BDB,在第一个表抛出异常后,状态是pending,不会同步到follower,在多次restore成功后,表名是alias的名称,所以follower记录不会replay drop table的操作,导致follower永远是原始手动创建表的Meta。 Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: https://github.com/apache/doris-website/pull/1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org