Note that 70,000 docs/second pretty much guarantees that there are
multiple shards. Lots of shards.

But since you're using SolrJ, the  very first thing I'd try would be
to comment out the SolrClient.add(doclist) call so you're doing
everything _except_ send the docs to Solr. That'll tell you whether
there's any bottleneck on getting the docs from the system of record.
The fact that you're pegging the CPUs argues that you are feeding Solr
as fast as Solr can go so this is just a sanity check. But it's
simple/fast.

As far as what on Solr could be the bottleneck, no real way to know
without profiling. But 300+ fields per doc probably just means you're
doing a lot of processing, I'm not particularly hopeful you'll be able
to speed things up without either more shards or simplifying your
schema.

Best,
Erick

On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem
<prog.mahm...@gmail.com> wrote:
> Hi great community,
>
> I have a SolrCloud with the following configuration:
>
>    - 2 nodes (r3.2xlarge 61GB RAM)
>    - 4 shards.
>    - The producer can produce 13,000+ docs per second
>    - The schema contains about 300+ fields and the document size is about
>    3KB.
>    - Using SolrJ and SolrCloudClient, each batch to solr contains 500 docs.
>
> When I start my bulk indexer program the CPU utilization is 100% on each
> server but the rate of the indexer is about 1500 docs per second.
>
> I know that some solr benchmarks reached 70,000+ doc. per second.
>
> The question: What is the best way to determine the bottleneck on solr
> indexing rate?
>
> Thanks,
> Mahmoud

Reply via email to