huyangg opened a new issue #8446: URL: https://github.com/apache/incubator-doris/issues/8446
### 问题描述: ``` 复制现有doris环境整个目录,在新环境启动导致原环境的be服务宕机。 ``` ### 问题复现的case: ``` 复制现有doris环境整个目录,在新环境启动,观察原环境be状态。前提条件:新环境和原环境网络互通。 ``` ### Doris版本: ``` Palo version 0.14.13.1-Unknown ``` ### Doris集群基本信息: ``` 单节点和多节点环境。新环境在 fe.conf 中添加配置:metadata_failure_recovery=true。 ``` ### 异常信息: ``` dmesg -T无OOM信息。 原环境be alive状态为false ,ErrMsg为 epoch is not greater than local. ignore heartbeat. MySQL [(none)]> SHOW PROC '/backends'; +-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+ | BackendId | Cluster | IP | HostName | HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime | LastHeartbeat | Alive | SystemDecommissioned | ClusterDecommissioned | TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct | MaxDiskUsedPct | ErrMsg | Version | Status | +-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+ | 10003 | default_cluster | 172.16.1.201 | 172.16.1.201 | 9050 | 9060 | 8040 | 8060 | 2021-10-20 21:34:31 | 2022-03-10 15:27:53 | false | false | false | 837 | 1.442 GB | 138.925 GB | 191.024 GB | 27.27 % | 27.27 % | epoch is not greater than local. ignore heartbeat. | 0.14.13.1-Unknown | {"lastSuccessReportTabletsTime":"2022-03-10 15:27:22","lastStreamLoadTime":1645584570768} | +-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+ be.info.log信息: I0310 15:27:53.379778 3011 plan_fragment_executor.cpp:583] Close() fragment_instance_id=65ac48000aed4ecc-9b947eb57686de25 I0310 15:27:54.298195 10114 heartbeat_server.cpp:58] get heartbeat from FE.host:172.16.2.113, port:9020, cluster id:138756675, counter:2424613 I0310 15:27:54.298223 10114 heartbeat_server.cpp:120] master change. new master host: 172.16.2.113. port: 9020. epoch: 8 I0310 15:27:54.298228 10114 heartbeat_server.cpp:166] Master FE is changed or restarted. report tablet and disk info immediately I0310 15:27:54.298241 10114 task_worker_pool.cpp:258] notify task worker pool: TaskWorkerPool.REPORT_DISK_STATE I0310 15:27:54.298250 10114 task_worker_pool.cpp:258] notify task worker pool: TaskWorkerPool.REPORT_OLAP_TABLE I0310 15:27:54.298363 3128 data_dir.cpp:837] path: /root/DORIS-0.14.7-release/be/storage total capacity: 1064086802432, available capacity: 953151504384 I0310 15:27:54.299175 3129 tablet_manager.cpp:880] begin to build all report tablets info I0310 15:27:54.299291 3129 tablet_manager.cpp:885] find expired transactions for 0 tablets I0310 15:27:54.299764 3128 storage_engine.cpp:373] get root path info cost: 1 ms. tablet counter: 2087 I0310 15:27:54.300318 10115 backend_service.cpp:325] get_batch stream_load_record rocksdb successfully. records size: 0, last_stream_load_timestamp: 1645584507086 I0310 15:27:54.305917 3129 tablet_manager.cpp:922] success to build all report tablets info. tablet_count=2087 I0310 15:27:54.353857 3128 task_worker_pool.cpp:1587] finish report DISK. master host: 172.16.2.113, port: 9020 I0310 15:27:54.361510 3129 task_worker_pool.cpp:1587] finish report TABLET. master host: 172.16.2.113, port: 9020 I0310 15:27:57.650785 3127 task_worker_pool.cpp:1587] finish report TASK. master host: 172.16.2.113, port: 9020 I0310 15:27:57.753644 3063 storage_engine.cpp:625] start trash and snapshot sweep. I0310 15:27:57.755581 3063 storage_engine.cpp:373] get root path info cost: 1 ms. tablet counter: 2087 I0310 15:27:57.755627 3063 storage_engine.cpp:647] Start to sweep path /root/DORIS-0.14.7-release/be/storage W0310 15:27:58.920964 3247 heartbeat_server.cpp:125] epoch is not greater than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 received epoch: 7 W0310 15:28:03.929098 3247 heartbeat_server.cpp:125] epoch is not greater than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 received epoch: 7 I0310 15:28:07.652289 3127 task_worker_pool.cpp:1587] finish report TASK. master host: 172.16.2.113, port: 9020 W0310 15:28:08.936048 3247 heartbeat_server.cpp:125] epoch is not greater than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 received epoch: 7 W0310 15:28:13.945868 3247 heartbeat_server.cpp:125] epoch is not greater than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 received epoch: 7 I0310 15:28:17.653152 3127 task_worker_pool.cpp:1587] finish report TASK. master host: 172.16.2.113, port: 9020 I0310 15:28:18.727761 3061 load_channel_mgr.cpp:241] cleaning timed out load channels I0310 15:28:18.727794 3061 load_channel_mgr.cpp:274] load mem consumption(bytes). limit: 86418309775, current: 0, peak: 1241120388 W0310 15:28:18.952822 3247 heartbeat_server.cpp:125] epoch is not greater than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 received epoch: 7 W0310 15:28:23.958277 3247 heartbeat_server.cpp:125] epoch is not greater than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8 received epoch: 7 ``` ### 解决方案(社区技术人员或者其他用户给出的回复解决方案) ``` ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org