I have an HDFS cluster of three nodes. They are all running on Amazon EC2 
instances. I am using HDFS for an HBase backing store. Periodically, I will 
start the cluster and the name node stays in safe mode because it says the 
number of live datanodes has dropped to 0.

The number of live datanodes 2 has reached the minimum number 0. Safe mode will 
be turned off automatically once the thresholds have been reached.
The datanode logs appear to be normal, with no errors indicated. The dfsadmin 
report says the datanodes are both normal and that the name node is in contact 
with them.

Safe mode is ON
Configured Capacity: 16637566976 (15.49 GB)
Present Capacity: 7941234688 (7.40 GB)
DFS Remaining: 7940620288 (7.40 GB)
DFS Used: 614400 (600 KB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 172.31.52.176:50010 (dev2)
Hostname: dev2
Decommission Status : Normal
Configured Capacity: 8318783488 (7.75 GB)
DFS Used: 307200 (300 KB)
Non DFS Used: 3257020416 (3.03 GB)
DFS Remaining: 5061455872 (4.71 GB)
DFS Used%: 0.00%
DFS Remaining%: 60.84%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Oct 04 15:47:00 EDT 2016


Name: 172.31.63.188:50010 (dev1)
Hostname: dev1
Decommission Status : Normal
Configured Capacity: 8318783488 (7.75 GB)
DFS Used: 307200 (300 KB)
Non DFS Used: 5439311872 (5.07 GB)
DFS Remaining: 2879164416 (2.68 GB)
DFS Used%: 0.00%
DFS Remaining%: 34.61%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Oct 04 15:47:00 EDT 2016
If I force the name node out of safe mode, the fsck commmand says that the file 
system is corrupt. When this happens, the only thing I've been able to do to 
get it back is to format the HDFS file system. I have not changed the 
configuration of the cluster. This just randomly seems to occur. The system is 
in development, but this will be unacceptable in production.

I’m using version 2.7.3. Thank you in advance for any help.

Reply via email to