I wouldn't use ConcurrentUpdateSolrClient for the following reasons:

1> If a doc that needs to go to shard2 is received by a replica on
shard1, it must be forwarded to the leader of shard1, introducing an
extra hop. CloudSolrClient subdivides the batch and sends the docs to
the leader of the right shard automatically. You are batching, right?
You should.

2> CloudSolrClient does the above in parallel _already_.

3> You put the load for routing docs entirely on the single Solr node
you specify in the url.

4> You introduce a single point of failure (i.e. the node you specify
in the url).

5> If your indexing throughput is not what you need, you can string
together N SolrJ clients. Or you can create N threads in your indexing
client and still get the advantages of CloudSolrClient routing docs
correctly.

You also want to be a little careful how hard you drive Solr if you're
also serving queries at the same time, the more cycles you use for
indexing the fewer are available to serve queries.

Best,
Erick


On Wed, Oct 24, 2018 at 1:01 PM Shamik Bandopadhyay <sham...@gmail.com> wrote:
>
> Hi,
>
>    I'm looking into the possibility of using ConcurrentUpdateSolrClient for
> indexing a large volume of data instead of CloudSolrClient. Having an
> async,batch API seems to be a better fit for us where we tend to index a
> lot of data periodically. As I'm looking into the API, I'm wonderign if
> this can be used for SolrCloud.
>
> ConcurrentUpdateSolrClientclient = new
> ConcurrentUpdateSolrClient.Builder(url).withThreadCount(100).withQueueSize(50).build();
>
> The Builder object only takes a single url, not sure what that would be in
> case of SolrCloud. For e.g. if I've two shards with a couple of replicas,
> then what will be the server url?
>
> I was not able to find any relevant document or example to clarify my
> doubt. Any pointers will be appreciated.
>
> Thanks

Reply via email to