Sharding should, in general, _not_ be used as long as the response
time for individual queries is acceptable. It imposes a certain amount
of overhead. The typical process is two-pass. pass1: get the candidate
top N docs from a replica on each shard. pass2: have each shard return
its portion of the top N docs found in pass 1.

There's an option for one-pass processing, but I don't think that's
really what you're looking for here.

There will be M sub-queries sent out, one to a replica for each of
your M shards. Etc.

So if everything fits in one shard with adequate response times, I'd
recommend you have only one. Add _replicas_ to get more QPS, possibly
on different machines.

You still get all the goodness of HA/DR with SolrCloud, so it's
perfectly reasonable to have a 1-shard collection with N replicas
handled by SolrCloud.

Best,
Erick

On Wed, Apr 26, 2017 at 4:00 PM, Jakov Sosic <jso...@gmail.com> wrote:
> Hi guys,
>
> I was wondering does the introduction of shards actually increase CPU usage?
>
> I have a 30GB index split into two shards (15GB each), and by analyzing the
> logs, I figured out that ~80% of the queries have the
> "&shard.url=http://10.3.4.12:8080/solr/mycore/|http://10.3.4.14:8080/solr/mycore/";.
>
> I basically don't need sharding, and am now starting to wonder if shards are
> actually increasing the CPU usage of my nodes or not, cause of the huge
> percentage of queries with "shard.url=" flag?
>
> I'm fighting with high cpu usage, and if turning sharding of and just
> keeping the replicas in my collection would lower the CPU usage for more
> then 10% I would choose that path..
>
>
> Any insights?
>
> Thanks.
>

Reply via email to