On 6/26/2013 10:58 AM, Mike L. wrote:
>  
> Hello,
>  
>        I'm trying to execute a parallel DIH process and running into heap 
> related issues, hoping somebody has experienced this and can recommend some 
> options..
>  
>        Using Solr 3.5 on CentOS.
>        Currently have JVM heap 4GB min , 8GB max
>  
>      When executing the entities in a sequential process (entities executing 
> in sequence by default), my heap never exceeds 3GB. When executing the 
> parallel process, everything runs fine for roughly an hour, then I reach the 
> 8GB max heap size and the process stalls/fails.
>  
>      More specifically, here's how I'm executing the parallel import process: 
> I target a logical range (i.e WHERE some field BETWEEN 'SOME VALUE' AND 'SOME 
> VALUE') within my entity queries. And within Solrconfig.xml, I've created 
> corresponding data import handlers, one for each of these entities.
>  
> My total rows fetch/count is 9M records.
>  
> And when I initiate the import, I call each one, similar to the below 
> (obviously I've stripped out my server & naming conventions.
>  
> http://[server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting1]&clean=true
>  
> http://[server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting2]
>  
>  
> I assume that when doing this, only the first import request needs to contain 
> the clean=true param. 
>  
> I've divided each import query to target roughly the same amount of data, and 
> in solrconfig, I've tried various things in hopes to reduce heap size.

Thanks for including some solrconfig snippets, but I think what we
really need is your DIH configuration(s).  Use a pastebin site and
choose the proper document type.  http://apaste.info is available and
the proper type there would be (X)HTML.  If you need to sanitize these
to remove host/user/pass, please replace the values with something else
rather than deleting them entirely.

With full-import, clean defaults to true, so including it doesn't change
anything.  What I would actually do is have clean=true on the first
import you run, then after waiting a few seconds to be sure it is
running, start the others with clean=false so that they don't do ANOTHER
clean.

I suspect that you might be running into JDBC driver behavior where the
entire result set is being buffered into RAM.

Thanks,
Shawn

Reply via email to