We are also facing the same problem in loading 14 Billion documents into Solr 4.8.10.
Dataimport is working in Single threaded, which is taking more than 3 weeks. This is working fine without any issues but it takes months to complete the load. When we tried SolrJ with the below configuration in Multithreaded load, the Solr is taking more memory & at one point we will end up in out of memory as well. Batch Doc count : 100000 docs No of Threads : 16/32 Solr Memory Allocated : 200 GB The reason can be as below. Solr is taking the snapshot, whenever we open a SearchIndexer. Due to this more memory is getting consumed & solr is extremely slow while running 16 or more threads for loading. If anyone have already done the multithreaded data load into Solr in a quicker way, Can you please share the code or logic in using the SolrJ API? Thanks in advance. Regards, Suresh.A -----Original Message----- From: Dyer, James [mailto:james.d...@ingramcontent.com] Sent: Tuesday, February 03, 2015 1:58 PM To: solr-user@lucene.apache.org Subject: RE: Solr 4.9 Calling DIH concurrently DIH is single-threaded. There was once a threaded option, but it was buggy and subsequently was removed. What I do is partition my data and run multiple dih request handlers at the same time. It means redundant sections in solrconfig.xml and its not very elegant but it works. For instance, for a sql query, I add something like this: "where mod(id, ${dataimporter.request.numPartitions})=${dataimporter.request.currentPartition}". I think, though, most users who want to make the most out of multithreading write their own program and use the solrj api to send the updates. James Dyer Ingram Content Group -----Original Message----- From: meena.sri...@mathworks.com [mailto:meena.sri...@mathworks.com] Sent: Tuesday, February 03, 2015 3:43 PM To: solr-user@lucene.apache.org Subject: Solr 4.9 Calling DIH concurrently Hi I am using solr 4.9 and need to index million of documents from database. I am using DIH and sending request to fetch by ids. Is there a way to run multiple indexing threads, concurrently in DIH. I want to take advantage of <maxIndexingThreads> parameter. How do I do it. I am just invoking DIH handler using solrj HttpSolrServer. And issue requests sequentially. http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=100&minId=1 http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=201&minId=101 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744.html Sent from the Solr - User mailing list archive at Nabble.com.