SWJTU-ZhangLei opened a new issue, #10410: URL: https://github.com/apache/doris/issues/10410
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Version root@regtest-15-bj:/home/zhanglei/test/output_2/be# ./lib/palo_be --version trunk RELEASE (build git://regtest-15-bj/home/zhanglei/incubator-doris/be/../@3370c105286ac9f2d590d0bf43f811a5cb52171e) Built on Fri, 24 Jun 2022 14:04:29 CST by root@regtest-15-bj ### What's Wrong? when adding a new fe, report the exception: 422 2022-06-24 20:00:09,620 INFO (main|1) [Catalog.loadBackupHandler():1781] finished replay backupHandler from image 423 2022-06-24 20:00:09,622 INFO (main|1) [Catalog.loadPaloAuth():1794] finished replay paloAuth from image 424 2022-06-24 20:00:09,622 INFO (main|1) [Catalog.loadTransactionState():1802] finished replay transactionState from image 425 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadColocateTableIndex():1830] finished replay colocateTableIndex from image 426 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadRoutineLoadJobs():1836] finished replay routineLoadJobs from image 427 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadLoadJobsV2():1842] finished replay loadJobsV2 from image 428 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadSmallFiles():1854] finished replay smallFiles from image 429 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadPlugins():4746] finished replay plugins from image 430 2022-06-24 20:00:09,666 INFO (main|1) [Catalog.loadDeleteHandler():1787] finished replay deleteHandler from image 431 2022-06-24 20:00:09,667 INFO (main|1) [Catalog.loadSqlBlockRule():1862] finished replay sqlBlockRule from image 432 2022-06-24 20:00:09,671 INFO (main|1) [Catalog.loadPolicy():1873] finished replay policy from image 433 2022-06-24 20:00:09,671 INFO (main|1) [MetaReader.read():104] finished to load image in 257 ms 434 2022-06-24 20:00:09,993 INFO (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():160] add helper[172.21.16.15:29010] as ReplicationGroupAdmin 435 2022-06-24 20:00:09,993 INFO (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():166] add self[172.21.16.12:29010] as ReplicationGroupAdmin 436 2022-06-24 20:00:09,995 WARN (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [Catalog.notifyNewFETypeTransfer():2267] notify new FE type transfer: UNKNOWN 437 2022-06-24 20:00:10,014 WARN (RepNode 172.21.16.12_29010_1656071922199(-1)|67) [BDBStateChangeListener.stateChange():57] this node is DETACHED 438 2022-06-24 20:00:20,001 ERROR (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():199] error to open replicated environment. will exit. 439 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta 439 /bdb Feeder: 172.21.16.15_29010_1656058773192(4). com.sleepycat.je.rep.impl.RepGroupImpl$NodeConflictException: (JE 18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with the socket address: /172.21.16.12:29010. It conflicts with the socket already used by the 439 member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 17 439 2.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 172.21.16.12_29010_16560719 22199(-1) 440 at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230) ~[je-18.3.12.jar:18.3.12] 441 at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.12.jar:18.3.12] 442 at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) ~[je-18.3.12.jar:18.3.12] 443 at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) ~[je-18.3.12.jar:18.3.12] 444 at com.sleepycat.je.Environment.<init>(Environment.java:258) ~[je-18.3.12.jar:18.3.12] 445 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605) ~[je-18.3.12.jar:18.3.12] 446 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464) ~[je-18.3.12.jar:18.3.12] 447 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538) ~[je-18.3.12.jar:18.3.12] 448 at org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) ~[palo-fe.jar:1.0-SNAPSHOT] 449 at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) ~[palo-fe.jar:1.0-SNAPSHOT] 450 at org.apache.doris.persist.EditLog.open(EditLog.java:889) ~[palo-fe.jar:1.0-SNAPSHOT] 451 at org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) ~[palo-fe.jar:1.0-SNAPSHOT] 452 at org.apache.doris.PaloFe.start(PaloFe.java:128) ~[palo-fe.jar:1.0-SNAPSHOT] 453 at org.apache.doris.PaloFe.main(PaloFe.java:63) ~[palo-fe.jar:1.0-SNAPSHOT] 454 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta/bdb Feeder: 172.21.16.15_29010_1656058773192(4). com.sleepycat.je.rep.impl.RepGroupIm 454 pl$NodeConflictException: (JE 18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with the socket address: /172.21.16.12:29010. It conflicts with the socket already used by the member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during the handshake b 454 etween two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 172.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 172.21.16.12 454 _29010_1656071922199(-1) 455 at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342) ~[je-18.3.12.jar:18.3.12] 456 at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267) ~[je-18.3.12.jar:18.3.12] 457 at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) ~[je-18.3.12.jar:18.3.12] 458 at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.12.jar:18.3.12] 459 at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.12.jar:18.3.12] 460 at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.12.jar:18.3.12] ### What You Expected? add fe successfully. ### How to Reproduce? 1、build a cluster with 3 fe (fe1, fe2, fe3), fe1 is master; 2、stop all fe; 3、set metadata_failure_recovery=true for fe1(master) and start fe1; 4、remove the config of metadata_failure_recovery and restart fe1. 5、use mysql client connect to fe1, drop fe2, fe3. 6、add fe2, and clear fe2's meta, then start fe2 with --helper fe1. 7、start fe3 with --helper fe1, fe3's log will print like this: 288 2022-06-24 19:56:33,653 INFO (UNKNOWN 172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():160] add helper[172.21.16.15:29010] as ReplicationGroupAdmin 289 2022-06-24 19:56:33,654 INFO (UNKNOWN 172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():166] add self[172.21.16.12:29010] as ReplicationGroupAdmin 290 2022-06-24 19:56:33,657 WARN (UNKNOWN 172.21.16.12_29010_1656058910620(-1)|1) [Catalog.notifyNewFETypeTransfer():2267] notify new FE type transfer: UNKNOWN 291 2022-06-24 19:56:33,675 WARN (RepNode 172.21.16.12_29010_1656058910620(-1)|64) [BDBStateChangeListener.stateChange():57] this node is DETACHED 292 2022-06-24 19:56:43,671 ERROR (UNKNOWN 172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():199] error to open replicated environment. will exit. 293 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656058910620(3):/home/zhanglei/test/output_2/fe/doris-meta/ 293 bdb Feeder: 172.21.16.15_29010_1656058773192(4). The environments have the same name: PALO_JOURNAL_GROUP but represent different environment instances. The environment at the master has UUID 4e0bedad-1111-4c65-92a8-e6be60308d7b, while the replica 172.21.16.12_29010_1656058910620 has UU 293 ID: 25c525de-eadf-4ace-892d-523be019caa4 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 172 293 .21.16.12_29010_1656058910620(-1) Originally thrown by HA thread: RepNode 172.21.16.12_29010_165605891 0620(-1) 294 at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230) ~[je-18.3.12.jar:18.3.12] 295 at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.12.jar:18.3.12] 296 at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) ~[je-18.3.12.jar:18.3.12] 297 at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) ~[je-18.3.12.jar:18.3.12] 298 at com.sleepycat.je.Environment.<init>(Environment.java:258) ~[je-18.3.12.jar:18.3.12] 299 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605) ~[je-18.3.12.jar:18.3.12] 300 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464) ~[je-18.3.12.jar:18.3.12] 301 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538) ~[je-18.3.12.jar:18.3.12] 302 at org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) ~[palo-fe.jar:1.0-SNAPSHOT] 303 at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) ~[palo-fe.jar:1.0-SNAPSHOT] 304 at org.apache.doris.persist.EditLog.open(EditLog.java:889) ~[palo-fe.jar:1.0-SNAPSHOT] 305 at org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) ~[palo-fe.jar:1.0-SNAPSHOT] 306 at org.apache.doris.PaloFe.start(PaloFe.java:128) ~[palo-fe.jar:1.0-SNAPSHOT] 307 at org.apache.doris.PaloFe.main(PaloFe.java:63) ~[palo-fe.jar:1.0-SNAPSHOT] 308 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656058910620(3):/home/zhanglei/test/output_2/fe/doris-meta/bdb Feeder: 172.21.16.15_29010_1656058773192(4). The environments have the same name: 308 PALO_JOURNAL_GROUP but represent different environment instances. The environment at the master has UUID 4e0bedad-1111-4c65-92a8-e6be60308d7b, while the replica 172.21.16.12_29010_1656058910620 has UUID: 25c525de-eadf-4ace-892d-523be019caa4 HANDSHAKE_ERROR: Error during the handshake be 308 tween two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 172.21.16.12_29010_1656058910620(-1) Originally thrown by HA thread: RepNode 172.21.16.12_ 308 29010_1656058910620(-1) 309 at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342) ~[je-18.3.12.jar:18.3.12] 310 at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267) ~[je-18.3.12.jar:18.3.12] 311 at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) ~[je-18.3.12.jar:18.3.12] 312 at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.12.jar:18.3.12] 313 at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.12.jar:18.3.12] 314 at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.12.jar:18.3.12] 8、add fe3, and clear fe3's meta, then start fe3 with --helper fe1. fe3's cannot start, error like this: 437 2022-06-24 20:00:10,014 WARN (RepNode 172.21.16.12_29010_1656071922199(-1)|67) [BDBStateChangeListener.stateChange():57] this node is DETACHED 438 2022-06-24 20:00:20,001 ERROR (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():199] error to open replicated environment. will exit. 439 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta 439 /bdb Feeder: 172.21.16.15_29010_1656058773192(4). com.sleepycat.je.rep.impl.RepGroupImpl$NodeConflictException: (JE 18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with the socket address: /172.21.16.12:29010. It conflicts with the socket already used by the 439 member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 17 439 2.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 172.21.16.12_29010_16560719 22199(-1) 440 at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230) ~[je-18.3.12.jar:18.3.12] 441 at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.12.jar:18.3.12] 442 at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) ~[je-18.3.12.jar:18.3.12] 443 at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) ~[je-18.3.12.jar:18.3.12] 444 at com.sleepycat.je.Environment.<init>(Environment.java:258) ~[je-18.3.12.jar:18.3.12] 445 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605) ~[je-18.3.12.jar:18.3.12] 446 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464) ~[je-18.3.12.jar:18.3.12] 447 at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538) ~[je-18.3.12.jar:18.3.12] 448 at org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) ~[palo-fe.jar:1.0-SNAPSHOT] 449 at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) ~[palo-fe.jar:1.0-SNAPSHOT] 450 at org.apache.doris.persist.EditLog.open(EditLog.java:889) ~[palo-fe.jar:1.0-SNAPSHOT] 451 at org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) ~[palo-fe.jar:1.0-SNAPSHOT] 452 at org.apache.doris.PaloFe.start(PaloFe.java:128) ~[palo-fe.jar:1.0-SNAPSHOT] 453 at org.apache.doris.PaloFe.main(PaloFe.java:63) ~[palo-fe.jar:1.0-SNAPSHOT] 454 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta/bdb Feeder: 172.21.16.15_29010_1656058773192(4). com.sleepycat.je.rep.impl.RepGroupIm 454 pl$NodeConflictException: (JE 18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with the socket address: /172.21.16.12:29010. It conflicts with the socket already used by the member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during the handshake b 454 etween two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 172.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 172.21.16.12 454 _29010_1656071922199(-1) 455 at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342) ~[je-18.3.12.jar:18.3.12] 456 at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267) ~[je-18.3.12.jar:18.3.12] 457 at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) ~[je-18.3.12.jar:18.3.12] 458 at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.12.jar:18.3.12] 459 at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.12.jar:18.3.12] 460 at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.12.jar:18.3.12] ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org