huyangg opened a new issue #8446:
URL: https://github.com/apache/incubator-doris/issues/8446


   ### 问题描述:
   
   ```
   复制现有doris环境整个目录,在新环境启动导致原环境的be服务宕机。
   ```
   
   ### 问题复现的case:
   ```
   复制现有doris环境整个目录,在新环境启动,观察原环境be状态。前提条件:新环境和原环境网络互通。
   
   ```
   ### Doris版本:
   ```
   Palo version 0.14.13.1-Unknown
   ```
   ### Doris集群基本信息:
   ```
   单节点和多节点环境。新环境在 fe.conf 中添加配置:metadata_failure_recovery=true。
   
   ```
   ### 异常信息:
   ```
   dmesg -T无OOM信息。
   
   原环境be  alive状态为false ,ErrMsg为 epoch is not greater than local. ignore 
heartbeat.
   MySQL [(none)]> SHOW PROC '/backends';
   
+-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+
   | BackendId | Cluster         | IP            | HostName      | 
HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime       | 
LastHeartbeat       | Alive | SystemDecommissioned | ClusterDecommissioned | 
TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | 
MaxDiskUsedPct | ErrMsg                                             | Version   
        | Status                                                                
                    |
   
+-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+
   | 10003     | default_cluster | 172.16.1.201 | 172.16.1.201 | 9050          
| 9060   | 8040     | 8060     | 2021-10-20 21:34:31 | 2022-03-10 15:27:53 | 
false | false                | false                 | 837       | 1.442 GB     
    | 138.925 GB    | 191.024 GB    | 27.27 % | 27.27 %        | epoch is not 
greater than local. ignore heartbeat. | 0.14.13.1-Unknown | 
{"lastSuccessReportTabletsTime":"2022-03-10 
15:27:22","lastStreamLoadTime":1645584570768} |
   
+-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+
   
   be.info.log信息:
   I0310 15:27:53.379778  3011 plan_fragment_executor.cpp:583] Close() 
fragment_instance_id=65ac48000aed4ecc-9b947eb57686de25
   I0310 15:27:54.298195 10114 heartbeat_server.cpp:58] get heartbeat from 
FE.host:172.16.2.113, port:9020, cluster id:138756675, counter:2424613
   I0310 15:27:54.298223 10114 heartbeat_server.cpp:120] master change. new 
master host: 172.16.2.113. port: 9020. epoch: 8
   I0310 15:27:54.298228 10114 heartbeat_server.cpp:166] Master FE is changed 
or restarted. report tablet and disk info immediately
   I0310 15:27:54.298241 10114 task_worker_pool.cpp:258] notify task worker 
pool: TaskWorkerPool.REPORT_DISK_STATE
   I0310 15:27:54.298250 10114 task_worker_pool.cpp:258] notify task worker 
pool: TaskWorkerPool.REPORT_OLAP_TABLE
   I0310 15:27:54.298363  3128 data_dir.cpp:837] path: 
/root/DORIS-0.14.7-release/be/storage total capacity: 1064086802432, available 
capacity: 953151504384
   I0310 15:27:54.299175  3129 tablet_manager.cpp:880] begin to build all 
report tablets info
   I0310 15:27:54.299291  3129 tablet_manager.cpp:885] find expired 
transactions for 0 tablets
   I0310 15:27:54.299764  3128 storage_engine.cpp:373] get root path info cost: 
1 ms. tablet counter: 2087
   I0310 15:27:54.300318 10115 backend_service.cpp:325] get_batch 
stream_load_record rocksdb successfully. records size: 0, 
last_stream_load_timestamp: 1645584507086
   I0310 15:27:54.305917  3129 tablet_manager.cpp:922] success to build all 
report tablets info. tablet_count=2087
   I0310 15:27:54.353857  3128 task_worker_pool.cpp:1587] finish report DISK. 
master host: 172.16.2.113, port: 9020
   I0310 15:27:54.361510  3129 task_worker_pool.cpp:1587] finish report TABLET. 
master host: 172.16.2.113, port: 9020
   I0310 15:27:57.650785  3127 task_worker_pool.cpp:1587] finish report TASK. 
master host: 172.16.2.113, port: 9020
   I0310 15:27:57.753644  3063 storage_engine.cpp:625] start trash and snapshot 
sweep.
   I0310 15:27:57.755581  3063 storage_engine.cpp:373] get root path info cost: 
1 ms. tablet counter: 2087
   I0310 15:27:57.755627  3063 storage_engine.cpp:647] Start to sweep path 
/root/DORIS-0.14.7-release/be/storage
   W0310 15:27:58.920964  3247 heartbeat_server.cpp:125] epoch is not greater 
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 
received epoch: 7
   W0310 15:28:03.929098  3247 heartbeat_server.cpp:125] epoch is not greater 
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 
received epoch: 7
   I0310 15:28:07.652289  3127 task_worker_pool.cpp:1587] finish report TASK. 
master host: 172.16.2.113, port: 9020
   W0310 15:28:08.936048  3247 heartbeat_server.cpp:125] epoch is not greater 
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 
received epoch: 7
   W0310 15:28:13.945868  3247 heartbeat_server.cpp:125] epoch is not greater 
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 
received epoch: 7
   I0310 15:28:17.653152  3127 task_worker_pool.cpp:1587] finish report TASK. 
master host: 172.16.2.113, port: 9020
   I0310 15:28:18.727761  3061 load_channel_mgr.cpp:241] cleaning timed out 
load channels
   I0310 15:28:18.727794  3061 load_channel_mgr.cpp:274] load mem 
consumption(bytes). limit: 86418309775, current: 0, peak: 1241120388
   W0310 15:28:18.952822  3247 heartbeat_server.cpp:125] epoch is not greater 
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 
received epoch: 7
   W0310 15:28:23.958277  3247 heartbeat_server.cpp:125] epoch is not greater 
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 
received epoch: 7
   
   
   ```
   
   ### 解决方案(社区技术人员或者其他用户给出的回复解决方案)
   ```
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to