True, I don't recall 0.20.2 (the original release that was a few years ago) carrying these fixes. You ought to upgrade that cluster to the current stable release for the many fixes you can benefit from :)
On Mon, May 14, 2012 at 11:58 PM, Prashant Kommireddi <[email protected]> wrote: > Thanks Harsh. I am using 0.20.2, I see on the Jira this issue was > fixed for 0.23? > > I will try out your suggestions and get back. > > On May 14, 2012, at 1:22 PM, Harsh J <[email protected]> wrote: > >> Your fsimage seems to have gone bad (is it 0-sized? I recall that as a >> known issue long since fixed). >> >> The easiest way is to fall back to the last available good checkpoint >> (From SNN). Or if you have multiple dfs.name.dirs, see if some of the >> other points have better/complete files on them, and re-spread them >> across after testing them out (and backing up the originals). >> >> Though what version are you running? Cause AFAIK most of the recent >> stable versions/distros include NN resource monitoring threads which >> should have placed your NN into safemode the moment all its disks ran >> near to out of space. >> >> On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi >> <[email protected]> wrote: >>> Hi, >>> >>> I am seeing an issue where Namenode does not start due an EOFException. The >>> disk was full and I cleared space up but I am unable to get past this >>> exception. Any ideas on how this can be resolved? >>> >>> 2012-05-14 10:10:44,018 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop >>> 2012-05-14 10:10:44,018 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>> isPermissionEnabled=false >>> 2012-05-14 10:10:44,023 INFO >>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: >>> Initializing FSNamesystemMetrics using context >>> object:org.apache.hadoop.metrics.file.FileContext >>> 2012-05-14 10:10:44,024 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered >>> FSNamesystemStatusMBean >>> 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage: >>> Number of files = 205470 >>> 2012-05-14 10:10:44,844 ERROR >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem >>> initialization failed. >>> java.io.EOFException >>> at java.io.DataInputStream.readFully(DataInputStream.java:180) >>> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) >>> 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server >>> on 54310 >>> 2012-05-14 10:10:44,845 ERROR >>> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException >>> at java.io.DataInputStream.readFully(DataInputStream.java:180) >>> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) >>> >>> 2012-05-14 10:10:44,846 INFO >>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: >>> /************************************************************ >>> SHUTDOWN_MSG: Shutting down NameNode at >>> gridforce-1.internal.salesforce.com/10.0.201.159 >>> ************************************************************/ >> >> >> >> -- >> Harsh J -- Harsh J
