Blocks is ~26,000,000 Files is a bit higher ~27,000,000 Currently running: [root@hnn217 ~]# java -version java version "1.7.0_09" Was running 1.6.0_23
export JVM_OPTIONS="-XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly" I will grab the gc logs and the heap dump in a follow up. On Sat, Dec 22, 2012 at 1:32 PM, Suresh Srinivas <[email protected]>wrote: > Please take a histo live dump when the memory is full. Note that this > causes full gc. > http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html > > What are the number of blocks you have on the system. > > Send the JVM options you are using. From earlier java versions which used > 1/8 of total heap for young gen, it has gone upto 1/3 of total heap. This > could also be the reason. > > Do you collect gc logs? Send that as well. > > Sent from a mobile device > > On Dec 22, 2012, at 9:51 AM, Edward Capriolo <[email protected]> > wrote: > > > Newer 1.6 are getting close to 1.7 so I am not going to fear a number and > > fight the future. > > > > I have been aat around 27 million files for a while been as high as 30 > > million I do not think that is related. > > > > I do not think it is related to checkpoints but I am considering > > raising/lowering the checkpoint triggers. > > > > On Saturday, December 22, 2012, Joep Rottinghuis <[email protected] > > > > wrote: > >> Do your OOMs correlate with the secondary checkpointing? > >> > >> Joep > >> > >> Sent from my iPhone > >> > >> On Dec 22, 2012, at 7:42 AM, Michael Segel <[email protected]> > > wrote: > >> > >>> Hey Silly question... > >>> > >>> How long have you had 27 million files? > >>> > >>> I mean can you correlate the number of files to the spat of OOMs? > >>> > >>> Even without problems... I'd say it would be a good idea to upgrade due > > to the probability of a lot of code fixes... > >>> > >>> If you're running anything pre 1.x, going to 1.7 java wouldn't be a > good > > idea. Having said that... outside of MapR, have any of the distros > > certified themselves on 1.7 yet? > >>> > >>> On Dec 22, 2012, at 6:54 AM, Edward Capriolo <[email protected]> > > wrote: > >>> > >>>> I will give this a go. I have actually went in JMX and manually > > triggered > >>>> GC no memory is returned. So I assumed something was leaking. > >>>> > >>>> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <[email protected]> > > wrote: > >>>> > >>>>> I know this will sound odd, but try reducing your heap size. We had > > an > >>>>> issue like this where GC kept falling behind and we either ran out of > > heap > >>>>> or would be in full gc. By reducing heap, we were forcing concurrent > > mark > >>>>> sweep to occur and avoided both full GC and running out of heap space > > as > >>>>> the JVM would collect objects more frequently. > >>>>> > >>>>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a > > while. > >>>>>> (which is why I never bothered an upgrade) > >>>>>> > >>>>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We > > have a > >>>>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about > >>>>>> 27,000,000 files. > >>>>>> > >>>>>> So the strangest thing is that every 1 and 1/2 hour the NN memory > > usage > >>>>>> increases until the heap is full. > >>>>>> > >>>>>> http://imagebin.org/240287 > >>>>>> > >>>>>> We tried failing over the NN to another machine. We change the Java > >>>>> version > >>>>>> from 1.6_23 -> 1.7.0. > >>>>>> > >>>>>> I have set the NameNode logs to debug and ALL and I have done the > same > >>>>> with > >>>>>> the data nodes. > >>>>>> Secondary NN is running and shipping edits and making new images. > >>>>>> > >>>>>> I am thinking something has corrupted the NN MetaData and after > enough > >>>>> time > >>>>>> it becomes a time bomb, but this is just a total shot in the dark. > > Does > >>>>>> anyone have any interesting trouble shooting ideas? > >> >
