On 6/26/2013 10:58 AM, Mike L. wrote: > > Hello, > > I'm trying to execute a parallel DIH process and running into heap > related issues, hoping somebody has experienced this and can recommend some > options.. > > Using Solr 3.5 on CentOS. > Currently have JVM heap 4GB min , 8GB max > > When executing the entities in a sequential process (entities executing > in sequence by default), my heap never exceeds 3GB. When executing the > parallel process, everything runs fine for roughly an hour, then I reach the > 8GB max heap size and the process stalls/fails. > > More specifically, here's how I'm executing the parallel import process: > I target a logical range (i.e WHERE some field BETWEEN 'SOME VALUE' AND 'SOME > VALUE') within my entity queries. And within Solrconfig.xml, I've created > corresponding data import handlers, one for each of these entities. > > My total rows fetch/count is 9M records. > > And when I initiate the import, I call each one, similar to the below > (obviously I've stripped out my server & naming conventions. > > http://[server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting1]&clean=true > > http://[server]/[solrappname]/[corename]/[ImportHandlerName]?command=full-import&entity=[NameOfEntityTargetting2] > > > I assume that when doing this, only the first import request needs to contain > the clean=true param. > > I've divided each import query to target roughly the same amount of data, and > in solrconfig, I've tried various things in hopes to reduce heap size.
Thanks for including some solrconfig snippets, but I think what we really need is your DIH configuration(s). Use a pastebin site and choose the proper document type. http://apaste.info is available and the proper type there would be (X)HTML. If you need to sanitize these to remove host/user/pass, please replace the values with something else rather than deleting them entirely. With full-import, clean defaults to true, so including it doesn't change anything. What I would actually do is have clean=true on the first import you run, then after waiting a few seconds to be sure it is running, start the others with clean=false so that they don't do ANOTHER clean. I suspect that you might be running into JDBC driver behavior where the entire result set is being buffered into RAM. Thanks, Shawn