Re: Parallal Import Process on same core. Solr 3.5

Michael Della Bitta Wed, 26 Jun 2013 10:11:40 -0700

Hi Mike,

Have you considered trying something like jhat or visualvm to see what's
taking up room on the heap?


http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html
http://visualvm.java.net/


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions
w: appinions.com <http://www.appinions.com/>


On Wed, Jun 26, 2013 at 12:58 PM, Mike L. <javaone...@yahoo.com> wrote:

>
> Hello,
>
>        I'm trying to execute a parallel DIH process and running into heap
> related issues, hoping somebody has experienced this and can recommend some
> options..
>
>        Using Solr 3.5 on CentOS.
>        Currently have JVM heap 4GB min , 8GB max
>
>      When executing the entities in a sequential process (entities
> executing in sequence by default), my heap never exceeds 3GB. When
> executing the parallel process, everything runs fine for roughly an hour,
> then I reach the 8GB max heap size and the process stalls/fails.
>
>      More specifically, here's how I'm executing the parallel import
> process: I target a logical range (i.e WHERE some field BETWEEN 'SOME
> VALUE' AND 'SOME VALUE') within my entity queries. And within
> Solrconfig.xml, I've created corresponding data import handlers, one for
> each of these entities.
>
> My total rows fetch/count is 9M records.
>
> And when I initiate the import, I call each one, similar to the below
> (obviously I've stripped out my server & naming conventions.
>
> http://
> [server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting1]&clean=true
> http://
> [server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting2]
>
> I assume that when doing this, only the first import request needs to
> contain the clean=true param.
>
> I've divided each import query to target roughly the same amount of data,
> and in solrconfig, I've tried various things in hopes to reduce heap size.
>
> Here's my current config:
>
>  <useCompoundFile>false</useCompoundFile>
>     <mergeFactor>15</mergeFactor>    <!-- I've experimented with 10, 15,25
> and haven't seen much differences -->
>     <ramBufferSizeMB>100</ramBufferSizeMB>
>     <maxMergeDocs>2147483647</maxMergeDocs>
>     <maxFieldLength>10000</maxFieldLength>
>     <writeLockTimeout>1000</writeLockTimeout>
>     <commitLockTimeout>10000</commitLockTimeout>
>     <lockType>single</lockType>
>   </indexDefaults>
>   <mainIndex>
>     <useCompoundFile>false</useCompoundFile>
>     <ramBufferSizeMB>100</ramBufferSizeMB>  <!-- I've bumped this up from
> 32 -->
>     <mergeFactor>15</mergeFactor>
>     <maxMergeDocs>2147483647</maxMergeDocs>
>     <maxFieldLength>10000</maxFieldLength>
>     <unlockOnStartup>false</unlockOnStartup>
>   </mainIndex>
>
>
> <updateHandler class="solr.DirectUpdateHandler2">
>    <autoCommit>
>       <maxTime>60000</maxTime> <!-- I've experimented with various times
> here as well -->
>       <maxDocs>25000</maxDocs> <!-- I've experimented with 25k, 500k,
> 100k -->
>     </autoCommit>
>     <maxPendingDeletes>100000</maxPendingDeletes>
>  </updateHandler>
>
>
> What gets tricky is finding the sweet spot with these parameters, but
> wondering if anybody has any recommendations for an optimal config. Also,
> regarding autoCommit, I've even turned that feature off, but my heap size
> reaches its max sooner. I am wondering though, what would be the difference
> with autoCommit and passing in the commit=true param on each import query.
>
> Thanks in advance!
> Mike

Re: Parallal Import Process on same core. Solr 3.5

Reply via email to