SWJTU-ZhangLei opened a new issue, #10410:
URL: https://github.com/apache/doris/issues/10410

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   root@regtest-15-bj:/home/zhanglei/test/output_2/be# ./lib/palo_be --version
   trunk RELEASE (build 
git://regtest-15-bj/home/zhanglei/incubator-doris/be/../@3370c105286ac9f2d590d0bf43f811a5cb52171e)
   Built on Fri, 24 Jun 2022 14:04:29 CST by root@regtest-15-bj
   
   ### What's Wrong?
   
   when adding a new fe,  report the exception:
     422 2022-06-24 20:00:09,620 INFO (main|1) 
[Catalog.loadBackupHandler():1781] finished replay backupHandler from image
       423 2022-06-24 20:00:09,622 INFO (main|1) [Catalog.loadPaloAuth():1794] 
finished replay paloAuth from image
       424 2022-06-24 20:00:09,622 INFO (main|1) 
[Catalog.loadTransactionState():1802] finished replay transactionState from 
image
       425 2022-06-24 20:00:09,623 INFO (main|1) 
[Catalog.loadColocateTableIndex():1830] finished replay colocateTableIndex from 
image
       426 2022-06-24 20:00:09,623 INFO (main|1) 
[Catalog.loadRoutineLoadJobs():1836] finished replay routineLoadJobs from image
       427 2022-06-24 20:00:09,623 INFO (main|1) 
[Catalog.loadLoadJobsV2():1842] finished replay loadJobsV2 from image
       428 2022-06-24 20:00:09,623 INFO (main|1) 
[Catalog.loadSmallFiles():1854] finished replay smallFiles from image
       429 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadPlugins():4746] 
finished replay plugins from image
       430 2022-06-24 20:00:09,666 INFO (main|1) 
[Catalog.loadDeleteHandler():1787] finished replay deleteHandler from image
       431 2022-06-24 20:00:09,667 INFO (main|1) 
[Catalog.loadSqlBlockRule():1862] finished replay sqlBlockRule from image
       432 2022-06-24 20:00:09,671 INFO (main|1) [Catalog.loadPolicy():1873] 
finished replay policy from image
       433 2022-06-24 20:00:09,671 INFO (main|1) [MetaReader.read():104] 
finished to load image in 257 ms
       434 2022-06-24 20:00:09,993 INFO (UNKNOWN 
172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():160] add 
helper[172.21.16.15:29010] as ReplicationGroupAdmin
       435 2022-06-24 20:00:09,993 INFO (UNKNOWN 
172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():166] add 
self[172.21.16.12:29010] as ReplicationGroupAdmin
       436 2022-06-24 20:00:09,995 WARN (UNKNOWN 
172.21.16.12_29010_1656071922199(-1)|1) 
[Catalog.notifyNewFETypeTransfer():2267] notify new FE type transfer: UNKNOWN
       437 2022-06-24 20:00:10,014 WARN (RepNode 
172.21.16.12_29010_1656071922199(-1)|67) 
[BDBStateChangeListener.stateChange():57] this node is DETACHED
       438 2022-06-24 20:00:20,001 ERROR (UNKNOWN 
172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():199] error to 
open replicated environment. will exit.
       439 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) 
Environment must be closed, caused by: 
com.sleepycat.je.EnvironmentFailureException: Environment invalid because of 
previous exception: (JE 18.3.12) 
172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta 
   439 /bdb  Feeder: 172.21.16.15_29010_1656058773192(4). 
com.sleepycat.je.rep.impl.RepGroupImpl$NodeConflictException: (JE 18.3.12) New 
or moved node:172.21.16.12_29010_1656071922199, is configured with the socket 
address: /172.21.16.12:29010.  It conflicts with the socket already used by the 
   439  member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during 
the handshake between two nodes. Some validity or compatibility check failed, 
preventing further communication between the nodes. Environment is invalid and 
must be closed. Originally thrown by HA thread: RepNode 17    439 
2.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 
172.21.16.12_29010_16560719
 22199(-1)
       440         at 
com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230)
 ~[je-18.3.12.jar:18.3.12]
       441         at 
com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) 
~[je-18.3.12.jar:18.3.12]
       442         at 
com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) 
~[je-18.3.12.jar:18.3.12]
       443         at 
com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) 
~[je-18.3.12.jar:18.3.12]
       444         at com.sleepycat.je.Environment.<init>(Environment.java:258) 
~[je-18.3.12.jar:18.3.12]
       445         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605)
 ~[je-18.3.12.jar:18.3.12]
       446         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464)
 ~[je-18.3.12.jar:18.3.12]
       447         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538)
 ~[je-18.3.12.jar:18.3.12]
       448         at 
org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) 
~[palo-fe.jar:1.0-SNAPSHOT]
       449         at 
org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) 
~[palo-fe.jar:1.0-SNAPSHOT]
       450         at org.apache.doris.persist.EditLog.open(EditLog.java:889) 
~[palo-fe.jar:1.0-SNAPSHOT]
       451         at 
org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) 
~[palo-fe.jar:1.0-SNAPSHOT]
       452         at org.apache.doris.PaloFe.start(PaloFe.java:128) 
~[palo-fe.jar:1.0-SNAPSHOT]
       453         at org.apache.doris.PaloFe.main(PaloFe.java:63) 
~[palo-fe.jar:1.0-SNAPSHOT]
       454 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment 
invalid because of previous exception: (JE 18.3.12) 
172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta/bdb
  Feeder: 172.21.16.15_29010_1656058773192(4). 
com.sleepycat.je.rep.impl.RepGroupIm    454 pl$NodeConflictException: (JE 
18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with 
the socket address: /172.21.16.12:29010.  It conflicts with the socket already 
used by the member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error 
during the handshake b    454 etween two nodes. Some validity or compatibility 
check failed, preventing further communication between the nodes. Environment 
is invalid and must be closed. Originally thrown by HA thread: RepNode 
172.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 
172.21.16.12    454 _29010_1656071922199(-1)
       455         at 
com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342)
 ~[je-18.3.12.jar:18.3.12]
       456         at 
com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267)
 ~[je-18.3.12.jar:18.3.12]
       457         at 
com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) 
~[je-18.3.12.jar:18.3.12]
       458         at 
com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) 
~[je-18.3.12.jar:18.3.12]
       459         at 
com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) 
~[je-18.3.12.jar:18.3.12]
       460         at 
com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) 
~[je-18.3.12.jar:18.3.12]
   
   ### What You Expected?
   
   add fe successfully.
   
   ### How to Reproduce?
   
   1、build a cluster with 3 fe (fe1, fe2, fe3), fe1 is master;
   2、stop all fe;
   3、set metadata_failure_recovery=true for fe1(master) and start fe1;
   4、remove the config of metadata_failure_recovery and restart fe1.
   5、use mysql client connect to fe1, drop fe2, fe3.
   6、add fe2, and clear fe2's meta, then start fe2 with --helper fe1.
   7、start fe3 with --helper fe1, fe3's log will print like this:
     288 2022-06-24 19:56:33,653 INFO (UNKNOWN 
172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():160] add 
helper[172.21.16.15:29010] as ReplicationGroupAdmin
       289 2022-06-24 19:56:33,654 INFO (UNKNOWN 
172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():166] add 
self[172.21.16.12:29010] as ReplicationGroupAdmin
       290 2022-06-24 19:56:33,657 WARN (UNKNOWN 
172.21.16.12_29010_1656058910620(-1)|1) 
[Catalog.notifyNewFETypeTransfer():2267] notify new FE type transfer: UNKNOWN
       291 2022-06-24 19:56:33,675 WARN (RepNode 
172.21.16.12_29010_1656058910620(-1)|64) 
[BDBStateChangeListener.stateChange():57] this node is DETACHED
       292 2022-06-24 19:56:43,671 ERROR (UNKNOWN 
172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():199] error to 
open replicated environment. will exit.
       293 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) 
Environment must be closed, caused by: 
com.sleepycat.je.EnvironmentFailureException: Environment invalid because of 
previous exception: (JE 18.3.12) 
172.21.16.12_29010_1656058910620(3):/home/zhanglei/test/output_2/fe/doris-meta/ 
   293 bdb  Feeder: 172.21.16.15_29010_1656058773192(4). The environments have 
the same name: PALO_JOURNAL_GROUP but represent different environment 
instances. The environment at the master has UUID 
4e0bedad-1111-4c65-92a8-e6be60308d7b, while the replica 
172.21.16.12_29010_1656058910620 has UU    293 ID: 
25c525de-eadf-4ace-892d-523be019caa4 HANDSHAKE_ERROR: Error during the 
handshake between two nodes. Some validity or compatibility check failed, 
preventing further communication between the nodes. Environment is invalid and 
must be closed. Originally thrown by HA thread: RepNode 172    293 
.21.16.12_29010_1656058910620(-1) Originally thrown by HA thread: RepNode 
172.21.16.12_29010_165605891
 0620(-1)
       294         at 
com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230)
 ~[je-18.3.12.jar:18.3.12]
       295         at 
com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) 
~[je-18.3.12.jar:18.3.12]
       296         at 
com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) 
~[je-18.3.12.jar:18.3.12]
       297         at 
com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) 
~[je-18.3.12.jar:18.3.12]
       298         at com.sleepycat.je.Environment.<init>(Environment.java:258) 
~[je-18.3.12.jar:18.3.12]
       299         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605)
 ~[je-18.3.12.jar:18.3.12]
       300         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464)
 ~[je-18.3.12.jar:18.3.12]
       301         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538)
 ~[je-18.3.12.jar:18.3.12]
       302         at 
org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) 
~[palo-fe.jar:1.0-SNAPSHOT]
       303         at 
org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) 
~[palo-fe.jar:1.0-SNAPSHOT]
       304         at org.apache.doris.persist.EditLog.open(EditLog.java:889) 
~[palo-fe.jar:1.0-SNAPSHOT]
       305         at 
org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) 
~[palo-fe.jar:1.0-SNAPSHOT]
       306         at org.apache.doris.PaloFe.start(PaloFe.java:128) 
~[palo-fe.jar:1.0-SNAPSHOT]
       307         at org.apache.doris.PaloFe.main(PaloFe.java:63) 
~[palo-fe.jar:1.0-SNAPSHOT]
       308 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment 
invalid because of previous exception: (JE 18.3.12) 
172.21.16.12_29010_1656058910620(3):/home/zhanglei/test/output_2/fe/doris-meta/bdb
  Feeder: 172.21.16.15_29010_1656058773192(4). The environments have the same 
name:     308 PALO_JOURNAL_GROUP but represent different environment instances. 
The environment at the master has UUID 4e0bedad-1111-4c65-92a8-e6be60308d7b, 
while the replica 172.21.16.12_29010_1656058910620 has UUID: 
25c525de-eadf-4ace-892d-523be019caa4 HANDSHAKE_ERROR: Error during the 
handshake be    308 tween two nodes. Some validity or compatibility check 
failed, preventing further communication between the nodes. Environment is 
invalid and must be closed. Originally thrown by HA thread: RepNode 
172.21.16.12_29010_1656058910620(-1) Originally thrown by HA thread: RepNode 
172.21.16.12_    308 29010_1656058910620(-1)
       309         at 
com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342)
 ~[je-18.3.12.jar:18.3.12]
       310         at 
com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267)
 ~[je-18.3.12.jar:18.3.12]
       311         at 
com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) 
~[je-18.3.12.jar:18.3.12]
       312         at 
com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) 
~[je-18.3.12.jar:18.3.12]
       313         at 
com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) 
~[je-18.3.12.jar:18.3.12]
       314         at 
com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) 
~[je-18.3.12.jar:18.3.12]
   
   8、add fe3, and clear fe3's meta, then start fe3 with --helper fe1.   fe3's 
cannot start, error like this:
       437 2022-06-24 20:00:10,014 WARN (RepNode 
172.21.16.12_29010_1656071922199(-1)|67) 
[BDBStateChangeListener.stateChange():57] this node is DETACHED
       438 2022-06-24 20:00:20,001 ERROR (UNKNOWN 
172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():199] error to 
open replicated environment. will exit.
       439 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) 
Environment must be closed, caused by: 
com.sleepycat.je.EnvironmentFailureException: Environment invalid because of 
previous exception: (JE 18.3.12) 
172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta 
   439 /bdb  Feeder: 172.21.16.15_29010_1656058773192(4). 
com.sleepycat.je.rep.impl.RepGroupImpl$NodeConflictException: (JE 18.3.12) New 
or moved node:172.21.16.12_29010_1656071922199, is configured with the socket 
address: /172.21.16.12:29010.  It conflicts with the socket already used by the 
   439  member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during 
the handshake between two nodes. Some validity or compatibility check failed, 
preventing further communication between the nodes. Environment is invalid and 
must be closed. Originally thrown by HA thread: RepNode 17    439 
2.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 
172.21.16.12_29010_16560719
 22199(-1)
       440         at 
com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230)
 ~[je-18.3.12.jar:18.3.12]
       441         at 
com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) 
~[je-18.3.12.jar:18.3.12]
       442         at 
com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) 
~[je-18.3.12.jar:18.3.12]
       443         at 
com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) 
~[je-18.3.12.jar:18.3.12]
       444         at com.sleepycat.je.Environment.<init>(Environment.java:258) 
~[je-18.3.12.jar:18.3.12]
       445         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605)
 ~[je-18.3.12.jar:18.3.12]
       446         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464)
 ~[je-18.3.12.jar:18.3.12]
       447         at 
com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538)
 ~[je-18.3.12.jar:18.3.12]
       448         at 
org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) 
~[palo-fe.jar:1.0-SNAPSHOT]
       449         at 
org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) 
~[palo-fe.jar:1.0-SNAPSHOT]
       450         at org.apache.doris.persist.EditLog.open(EditLog.java:889) 
~[palo-fe.jar:1.0-SNAPSHOT]
       451         at 
org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) 
~[palo-fe.jar:1.0-SNAPSHOT]
       452         at org.apache.doris.PaloFe.start(PaloFe.java:128) 
~[palo-fe.jar:1.0-SNAPSHOT]
       453         at org.apache.doris.PaloFe.main(PaloFe.java:63) 
~[palo-fe.jar:1.0-SNAPSHOT]
       454 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment 
invalid because of previous exception: (JE 18.3.12) 
172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta/bdb
  Feeder: 172.21.16.15_29010_1656058773192(4). 
com.sleepycat.je.rep.impl.RepGroupIm    454 pl$NodeConflictException: (JE 
18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with 
the socket address: /172.21.16.12:29010.  It conflicts with the socket already 
used by the member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error 
during the handshake b    454 etween two nodes. Some validity or compatibility 
check failed, preventing further communication between the nodes. Environment 
is invalid and must be closed. Originally thrown by HA thread: RepNode 
172.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 
172.21.16.12    454 _29010_1656071922199(-1)
       455         at 
com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342)
 ~[je-18.3.12.jar:18.3.12]
       456         at 
com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267)
 ~[je-18.3.12.jar:18.3.12]
       457         at 
com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) 
~[je-18.3.12.jar:18.3.12]
       458         at 
com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) 
~[je-18.3.12.jar:18.3.12]
       459         at 
com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) 
~[je-18.3.12.jar:18.3.12]
       460         at 
com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) 
~[je-18.3.12.jar:18.3.12]
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to