This is an automated email from the ASF dual-hosted git repository.

gavinchou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new d9eb14a96bb [fix](cloud) Fix cloud decomission lead to fe cant start 
(#46783)
d9eb14a96bb is described below

commit d9eb14a96bbe0cda1a6335c67bf3e9c4ef30513b
Author: deardeng <deng...@selectdb.com>
AuthorDate: Mon Jan 13 11:10:04 2025 +0800

    [fix](cloud) Fix cloud decomission lead to fe cant start (#46783)
    
    Fix issue with SQL node decommissioning process
    
    The SQL node decommissioning process does not wait for transactions at
    the watermark level to complete before setting the backend's
    isDecommissioned status to true.
    
    As a result, the value displayed in show backends immediately reflects
    isDecommissioned regardless of ongoing transactions initiated via SQL.
    
    When a user calls drop be to remove a backend while there is only one
    backend in the cluster, the edit log logs the drop backend action, which
    removes the cluster information from memory.
    
    After dropping the backend, the previous transaction watermark process
    completes its tasks and attempts to modify the backend status, which
    requires accessing the cluster information. However, since the cluster
    information has already been deleted, this results in a null pointer
    exception (NPE) during the lookup in the FE memory map, causing the FE
    to crash.
    
    Additionally, the sequence of edit logs is fixed as follows:
    
    Edit log logs drop backend
    Edit log modifies backend
    FE fails to start up
    
    
    ```
    2025-01-10 05:46:15,070 ERROR (replayer|15) [EditLog.loadJournal():1251] 
replay Operation Type 91, log id: 10578
    java.lang.NullPointerException: Cannot invoke 
"org.apache.doris.system.Backend.getCloudClusterName()" because "memBe" is null
            at 
org.apache.doris.cloud.system.CloudSystemInfoService.replayModifyBackend(CloudSystemInfoService.java:461)
 ~[doris-fe.jar:1.2-SNAPSHOT]
            at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:432) 
~[doris-fe.jar:1.2-SNAPSHOT]
            at org.apache.doris.catalog.Env.replayJournal(Env.java:2999) 
~[doris-fe.jar:1.2-SNAPSHOT]
            at org.apache.doris.catalog.Env$4.runOneCycle(Env.java:2761) 
~[doris-fe.jar:1.2-SNAPSHOT]
            at org.apache.doris.common.util.Daemon.run(Daemon.java:119) 
~[doris-fe.jar:1.2-SNAPSHOT]
    ```
---
 .../java/org/apache/doris/cloud/system/CloudSystemInfoService.java     | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java
 
b/fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java
index 36ca260dc17..71260c51f23 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java
@@ -457,6 +457,9 @@ public class CloudSystemInfoService extends 
SystemInfoService {
     @Override
     public void replayModifyBackend(Backend backend) {
         Backend memBe = getBackend(backend.getId());
+        if (memBe == null) {
+            return;
+        }
         // for rename cluster
         String originalClusterName = memBe.getCloudClusterName();
         String originalClusterId = memBe.getCloudClusterId();


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to