Suresh, There are a few common workaround for such problem. But, I think that submitting more than "maxIndexingThreads" is not really productive. Also, I think that out-of-memory problem is caused not by indexing, but by opening searcher. Do you really need to open it? I don't think it's a good idea to search on the instance which cooks many T index at the same time. Are you sure you don't issue superfluous commit, and you've disabled auto-commit?
let's nail down oom problem first, and then deal with indexing speedup. I like huge indices! On Wed, Feb 4, 2015 at 1:10 AM, Arumugam, Suresh <suresh.arumu...@emc.com> wrote: > We are also facing the same problem in loading 14 Billion documents into > Solr 4.8.10. > > Dataimport is working in Single threaded, which is taking more than 3 > weeks. This is working fine without any issues but it takes months to > complete the load. > > When we tried SolrJ with the below configuration in Multithreaded load, > the Solr is taking more memory & at one point we will end up in out of > memory as well. > > Batch Doc count : 100000 docs > No of Threads : 16/32 > > Solr Memory Allocated : 200 GB > > The reason can be as below. > > Solr is taking the snapshot, whenever we open a SearchIndexer. > Due to this more memory is getting consumed & solr is extremely > slow while running 16 or more threads for loading. > > If anyone have already done the multithreaded data load into Solr in a > quicker way, Can you please share the code or logic in using the SolrJ API? > > Thanks in advance. > > Regards, > Suresh.A > > -----Original Message----- > From: Dyer, James [mailto:james.d...@ingramcontent.com] > Sent: Tuesday, February 03, 2015 1:58 PM > To: solr-user@lucene.apache.org > Subject: RE: Solr 4.9 Calling DIH concurrently > > DIH is single-threaded. There was once a threaded option, but it was > buggy and subsequently was removed. > > What I do is partition my data and run multiple dih request handlers at > the same time. It means redundant sections in solrconfig.xml and its not > very elegant but it works. > > For instance, for a sql query, I add something like this: "where mod(id, > ${dataimporter.request.numPartitions})=${dataimporter.request.currentPartition}". > > I think, though, most users who want to make the most out of > multithreading write their own program and use the solrj api to send the > updates. > > James Dyer > Ingram Content Group > > > -----Original Message----- > From: meena.sri...@mathworks.com [mailto:meena.sri...@mathworks.com] > Sent: Tuesday, February 03, 2015 3:43 PM > To: solr-user@lucene.apache.org > Subject: Solr 4.9 Calling DIH concurrently > > Hi > > I am using solr 4.9 and need to index million of documents from database. > I am using DIH and sending request to fetch by ids. Is there a way to run > multiple indexing threads, concurrently in DIH. > I want to take advantage of > <maxIndexingThreads> > parameter. How do I do it. I am just invoking DIH handler using solrj > HttpSolrServer. > And issue requests sequentially. > > http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=100&minId=1 > > > http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=201&minId=101 > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>