Re: Solr 4.9 Calling DIH concurrently

Mikhail Khludnev Wed, 04 Feb 2015 00:05:32 -0800

Suresh,

There are a few common workaround for such problem. But, I think that
submitting more than "maxIndexingThreads" is not really productive. Also, I
think that out-of-memory problem is caused not by indexing, but by opening
searcher. Do you really need to open it? I don't think it's a good idea to
search on the instance which cooks many T index at the same time. Are you
sure you don't issue superfluous commit, and you've disabled auto-commit?


let's nail down oom problem first, and then deal with indexing speedup. I
like huge indices!

On Wed, Feb 4, 2015 at 1:10 AM, Arumugam, Suresh <suresh.arumu...@emc.com>
wrote:

> We are also facing the same problem in loading 14 Billion documents into
> Solr 4.8.10.
>
> Dataimport is working in Single threaded, which is taking more than 3
> weeks. This is working fine without any issues but it takes months to
> complete the load.
>
> When we tried SolrJ with the below configuration in Multithreaded load,
> the Solr is taking more memory & at one point we will end up in out of
> memory as well.
>
>         Batch Doc count      :  100000 docs
>         No of Threads          : 16/32
>
>         Solr Memory Allocated : 200 GB
>
> The reason can be as below.
>
>         Solr is taking the snapshot, whenever we open a SearchIndexer.
>         Due to this more memory is getting consumed & solr is extremely
> slow while running 16 or more threads for loading.
>
> If anyone have already done the multithreaded data load into Solr in a
> quicker way, Can you please share the code or logic in using the SolrJ API?
>
> Thanks in advance.
>
> Regards,
> Suresh.A
>
> -----Original Message-----
> From: Dyer, James [mailto:james.d...@ingramcontent.com]
> Sent: Tuesday, February 03, 2015 1:58 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr 4.9 Calling DIH concurrently
>
> DIH is single-threaded.  There was once a threaded option, but it was
> buggy and subsequently was removed.
>
> What I do is partition my data and run multiple dih request handlers at
> the same time.  It means redundant sections in solrconfig.xml and its not
> very elegant but it works.
>
> For instance, for a sql query, I add something like this: "where mod(id,
> ${dataimporter.request.numPartitions})=${dataimporter.request.currentPartition}".
>
> I think, though, most users who want to make the most out of
> multithreading write their own program and use the solrj api to send the
> updates.
>
> James Dyer
> Ingram Content Group
>
>
> -----Original Message-----
> From: meena.sri...@mathworks.com [mailto:meena.sri...@mathworks.com]
> Sent: Tuesday, February 03, 2015 3:43 PM
> To: solr-user@lucene.apache.org
> Subject: Solr 4.9 Calling DIH concurrently
>
> Hi
>
> I am using solr 4.9 and need to index million of documents from database.
> I am using DIH and sending request to fetch by ids. Is there a way to run
> multiple indexing threads, concurrently in DIH.
> I want to take advantage of
> <maxIndexingThreads>
> parameter. How do I do it. I am just invoking DIH handler using solrj
> HttpSolrServer.
> And issue requests sequentially.
>
> http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=100&minId=1
>
>
> http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=201&minId=101
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Re: Solr 4.9 Calling DIH concurrently

Reply via email to