Re: Configuration of parallel indexing threads

Erick Erickson Fri, 02 Jun 2017 08:35:53 -0700

that's pretty much my strategy.

I'll add parenthetically that I often see the bottleneck for indexing
to be acquiring the data from the system of record in the first place
rather than Solr. Assuming you're using SolrJ, an easy test is to
comment out the line that sends to Solr. There's usually some kind of
loop like:


while (more docs) {
    gather 1,000 docs into a list
    cloudSolrClient.add(docList);
    docList.clear()
}

So just comment out the cloudSolrClient.add line. I've seen situations
where the program still takes 95% of the time it takes to actually
index to Solr, in which case you need to focus on getting the data in
the first place.

And you need to batch updates, see:
https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/

Good Luck!
Erick

On Fri, Jun 2, 2017 at 2:59 AM, gigo314 <gigo...@gmail.com> wrote:
> Thanks for the replies. Just to confirm that I got it right:
> 1. Since there is no setting to control index writers, is it fair to assume
> that Solr always indexes at maximum possible speed?
> 2. The way to control write speed is to control number of clients that are
> simultaneously posting data, right?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4338599.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration of parallel indexing threads

Reply via email to