Hi Mike, Have you considered trying something like jhat or visualvm to see what's taking up room on the heap?
http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html http://visualvm.java.net/ Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions w: appinions.com <http://www.appinions.com/> On Wed, Jun 26, 2013 at 12:58 PM, Mike L. <javaone...@yahoo.com> wrote: > > Hello, > > I'm trying to execute a parallel DIH process and running into heap > related issues, hoping somebody has experienced this and can recommend some > options.. > > Using Solr 3.5 on CentOS. > Currently have JVM heap 4GB min , 8GB max > > When executing the entities in a sequential process (entities > executing in sequence by default), my heap never exceeds 3GB. When > executing the parallel process, everything runs fine for roughly an hour, > then I reach the 8GB max heap size and the process stalls/fails. > > More specifically, here's how I'm executing the parallel import > process: I target a logical range (i.e WHERE some field BETWEEN 'SOME > VALUE' AND 'SOME VALUE') within my entity queries. And within > Solrconfig.xml, I've created corresponding data import handlers, one for > each of these entities. > > My total rows fetch/count is 9M records. > > And when I initiate the import, I call each one, similar to the below > (obviously I've stripped out my server & naming conventions. > > http:// > [server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting1]&clean=true > http:// > [server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting2] > > I assume that when doing this, only the first import request needs to > contain the clean=true param. > > I've divided each import query to target roughly the same amount of data, > and in solrconfig, I've tried various things in hopes to reduce heap size. > > Here's my current config: > > <useCompoundFile>false</useCompoundFile> > <mergeFactor>15</mergeFactor> <!-- I've experimented with 10, 15,25 > and haven't seen much differences --> > <ramBufferSizeMB>100</ramBufferSizeMB> > <maxMergeDocs>2147483647</maxMergeDocs> > <maxFieldLength>10000</maxFieldLength> > <writeLockTimeout>1000</writeLockTimeout> > <commitLockTimeout>10000</commitLockTimeout> > <lockType>single</lockType> > </indexDefaults> > <mainIndex> > <useCompoundFile>false</useCompoundFile> > <ramBufferSizeMB>100</ramBufferSizeMB> <!-- I've bumped this up from > 32 --> > <mergeFactor>15</mergeFactor> > <maxMergeDocs>2147483647</maxMergeDocs> > <maxFieldLength>10000</maxFieldLength> > <unlockOnStartup>false</unlockOnStartup> > </mainIndex> > > > <updateHandler class="solr.DirectUpdateHandler2"> > <autoCommit> > <maxTime>60000</maxTime> <!-- I've experimented with various times > here as well --> > <maxDocs>25000</maxDocs> <!-- I've experimented with 25k, 500k, > 100k --> > </autoCommit> > <maxPendingDeletes>100000</maxPendingDeletes> > </updateHandler> > > > What gets tricky is finding the sweet spot with these parameters, but > wondering if anybody has any recommendations for an optimal config. Also, > regarding autoCommit, I've even turned that feature off, but my heap size > reaches its max sooner. I am wondering though, what would be the difference > with autoCommit and passing in the commit=true param on each import query. > > Thanks in advance! > Mike