We are also facing the same problem in loading 14 Billion documents into Solr 
4.8.10.

Dataimport is working in Single threaded, which is taking more than 3 weeks. 
This is working fine without any issues but it takes months to complete the 
load.

When we tried SolrJ with the below configuration in Multithreaded load, the 
Solr is taking more memory & at one point we will end up in out of memory as 
well.

        Batch Doc count      :  100000 docs
        No of Threads          : 16/32

        Solr Memory Allocated : 200 GB  

The reason can be as below.

        Solr is taking the snapshot, whenever we open a SearchIndexer. 
        Due to this more memory is getting consumed & solr is extremely slow 
while running 16 or more threads for loading.

If anyone have already done the multithreaded data load into Solr in a quicker 
way, Can you please share the code or logic in using the SolrJ API?

Thanks in advance.

Regards,
Suresh.A

-----Original Message-----
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Tuesday, February 03, 2015 1:58 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.9 Calling DIH concurrently

DIH is single-threaded.  There was once a threaded option, but it was buggy and 
subsequently was removed.  

What I do is partition my data and run multiple dih request handlers at the 
same time.  It means redundant sections in solrconfig.xml and its not very 
elegant but it works.

For instance, for a sql query, I add something like this: "where mod(id, 
${dataimporter.request.numPartitions})=${dataimporter.request.currentPartition}".

I think, though, most users who want to make the most out of multithreading 
write their own program and use the solrj api to send the updates.

James Dyer
Ingram Content Group


-----Original Message-----
From: meena.sri...@mathworks.com [mailto:meena.sri...@mathworks.com]
Sent: Tuesday, February 03, 2015 3:43 PM
To: solr-user@lucene.apache.org
Subject: Solr 4.9 Calling DIH concurrently

Hi 

I am using solr 4.9 and need to index million of documents from database. I am 
using DIH and sending request to fetch by ids. Is there a way to run multiple 
indexing threads, concurrently in DIH. 
I want to take advantage of
<maxIndexingThreads>
parameter. How do I do it. I am just invoking DIH handler using solrj 
HttpSolrServer.
And issue requests sequentially.
http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=100&minId=1

http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=201&minId=101





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to