Hi Wenqi,

Which version of hadoop are you using? (My initial guess is 3.X, as )

Can you see anything in the datanode logs on the lines of "org.apache.hadoop.hdfs.server.common.InconsistentFSStateException"

Although, you mention that it is new cluster, make sure the ClusterID matches. --> "attempt to load an used block storage: /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027"


On 22/4/20 9:50 pm, Wenqi Ma wrote:
Dear Hadoop community,

A strange problem happened in our fresh installed hdfs cluster. The datanode cannot start because two threads tried to load storage, the latter would fail and lead a FATAL error. As the following log shown, the blue text means DataStorage::addStorageLocations is called, the red text are I added to find the thread details. I compared the log with normal datanode's in another cluster, and there is only one line (blue text).

Any suggestion would be appreciated, thanks!

2020-04-22 19:32:25,666 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: nameservice1 2020-04-22 19:32:25,718 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: nameservice1 2020-04-22 19:32:25,726 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to node238/172.16.2.238:8020 <http://172.16.2.238:8020> starting to offer service 2020-04-22 19:32:25,727 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to node239/172.16.2.239:8020 <http://172.16.2.239:8020> starting to offer service 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2020-04-22 19:32:25,885 INFO org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 30 2020-04-22 19:32:25,885 INFO org.apache.hadoop.hdfs.server.common.Storage: kkk current thread: Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]  heartbeating to node239/172.16.2.239:8020 <http://172.16.2.239:8020>,5,main] 2020-04-22 19:32:25,885 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1) 2020-04-22 19:32:25,890 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename 35@node238 2020-04-22 19:32:25,918 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1558632922-172.16.2.238-1587535931027 2020-04-22 19:32:25,918 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027 2020-04-22 19:32:25,920 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;cid=hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe 2020-04-22 19:32:25,931 INFO org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 29 2020-04-22 19:32:25,931 INFO org.apache.hadoop.hdfs.server.common.Storage: kkk current thread: Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]  heartbeating to node238/172.16.2.238:8020 <http://172.16.2.238:8020>,5,main] 2020-04-22 19:32:25,931 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1) 2020-04-22 19:32:25,932 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory [DISK]file:/vdir/mnt/disk2/hadoop/data/ has already been used. 2020-04-22 19:32:25,950 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1558632922-172.16.2.238-1587535931027 2020-04-22 19:32:25,950 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze storage directories for block pool BP-1558632922-172.16.2.238-1587535931027 java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027     at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)     at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)     at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)     at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408)     at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)     at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397)     at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362)     at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)     at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
    at java.lang.Thread.run(Thread.java:748)

--
Best Regards!
Wenqi

Reply via email to