Hello, I'm trying to execute a parallel DIH process and running into heap related issues, hoping somebody has experienced this and can recommend some options.. Using Solr 3.5 on CentOS. Currently have JVM heap 4GB min , 8GB max When executing the entities in a sequential process (entities executing in sequence by default), my heap never exceeds 3GB. When executing the parallel process, everything runs fine for roughly an hour, then I reach the 8GB max heap size and the process stalls/fails. More specifically, here's how I'm executing the parallel import process: I target a logical range (i.e WHERE some field BETWEEN 'SOME VALUE' AND 'SOME VALUE') within my entity queries. And within Solrconfig.xml, I've created corresponding data import handlers, one for each of these entities. My total rows fetch/count is 9M records. And when I initiate the import, I call each one, similar to the below (obviously I've stripped out my server & naming conventions. http://[server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting1]&clean=true http://[server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting2] I assume that when doing this, only the first import request needs to contain the clean=true param. I've divided each import query to target roughly the same amount of data, and in solrconfig, I've tried various things in hopes to reduce heap size. Here's my current config: <useCompoundFile>false</useCompoundFile> <mergeFactor>15</mergeFactor> <!-- I've experimented with 10, 15,25 and haven't seen much differences --> <ramBufferSizeMB>100</ramBufferSizeMB> <maxMergeDocs>2147483647</maxMergeDocs> <maxFieldLength>10000</maxFieldLength> <writeLockTimeout>1000</writeLockTimeout> <commitLockTimeout>10000</commitLockTimeout> <lockType>single</lockType> </indexDefaults> <mainIndex> <useCompoundFile>false</useCompoundFile> <ramBufferSizeMB>100</ramBufferSizeMB> <!-- I've bumped this up from 32 --> <mergeFactor>15</mergeFactor> <maxMergeDocs>2147483647</maxMergeDocs> <maxFieldLength>10000</maxFieldLength> <unlockOnStartup>false</unlockOnStartup> </mainIndex>
<updateHandler class="solr.DirectUpdateHandler2"> <autoCommit> <maxTime>60000</maxTime> <!-- I've experimented with various times here as well --> <maxDocs>25000</maxDocs> <!-- I've experimented with 25k, 500k, 100k --> </autoCommit> <maxPendingDeletes>100000</maxPendingDeletes> </updateHandler> What gets tricky is finding the sweet spot with these parameters, but wondering if anybody has any recommendations for an optimal config. Also, regarding autoCommit, I've even turned that feature off, but my heap size reaches its max sooner. I am wondering though, what would be the difference with autoCommit and passing in the commit=true param on each import query. Thanks in advance! Mike