Hi Wenqi,
Which version of hadoop are you using? (My initial guess is 3.X, as )
Can you see anything in the datanode logs on the lines of
"org.apache.hadoop.hdfs.server.common.InconsistentFSStateException"
Although, you mention that it is new cluster, make sure the ClusterID
matches. --> "attempt to load an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027"
On 22/4/20 9:50 pm, Wenqi Ma wrote:
Dear Hadoop community,
A strange problem happened in our fresh installed hdfs cluster. The
datanode cannot start because two threads tried to load storage, the
latter would fail and lead a FATAL error.
As the following log shown, the blue text
means DataStorage::addStorageLocations is called, the red text are I
added to find the thread details.
I compared the log with normal datanode's in another cluster, and
there is only one line (blue text).
Any suggestion would be appreciated, thanks!
2020-04-22 19:32:25,666 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request
received for nameservices: nameservice1
2020-04-22 19:32:25,718 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
BPOfferServices for nameservices: nameservice1
2020-04-22 19:32:25,726 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
<registering> (Datanode Uuid unassigned) service to
node238/172.16.2.238:8020 <http://172.16.2.238:8020> starting to offer
service
2020-04-22 19:32:25,727 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
<registering> (Datanode Uuid unassigned) service to
node239/172.16.2.239:8020 <http://172.16.2.239:8020> starting to offer
service
2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 50020: starting
2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2020-04-22 19:32:25,885 INFO
org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 30
2020-04-22 19:32:25,885 INFO
org.apache.hadoop.hdfs.server.common.Storage: kkk current thread:
Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]
heartbeating to node239/172.16.2.239:8020
<http://172.16.2.239:8020>,5,main]
2020-04-22 19:32:25,885 INFO
org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to
upgrade data directories
(dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2020-04-22 19:32:25,890 INFO
org.apache.hadoop.hdfs.server.common.Storage: Lock on
/vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename 35@node238
2020-04-22 19:32:25,918 INFO
org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
directories for bpid BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,918 INFO
org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,920 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage:
nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;cid=hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe
2020-04-22 19:32:25,931 INFO
org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 29
2020-04-22 19:32:25,931 INFO
org.apache.hadoop.hdfs.server.common.Storage: kkk current thread:
Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]
heartbeating to node238/172.16.2.238:8020
<http://172.16.2.238:8020>,5,main]
2020-04-22 19:32:25,931 INFO
org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to
upgrade data directories
(dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2020-04-22 19:32:25,932 INFO
org.apache.hadoop.hdfs.server.common.Storage: Storage directory
[DISK]file:/vdir/mnt/disk2/hadoop/data/ has already been used.
2020-04-22 19:32:25,950 INFO
org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
directories for bpid BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,950 WARN
org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze
storage directories for block pool
BP-1558632922-172.16.2.238-1587535931027
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead:
attempt to load an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
at java.lang.Thread.run(Thread.java:748)
--
Best Regards!
Wenqi