Sharding should, in general, _not_ be used as long as the response time for individual queries is acceptable. It imposes a certain amount of overhead. The typical process is two-pass. pass1: get the candidate top N docs from a replica on each shard. pass2: have each shard return its portion of the top N docs found in pass 1.
There's an option for one-pass processing, but I don't think that's really what you're looking for here. There will be M sub-queries sent out, one to a replica for each of your M shards. Etc. So if everything fits in one shard with adequate response times, I'd recommend you have only one. Add _replicas_ to get more QPS, possibly on different machines. You still get all the goodness of HA/DR with SolrCloud, so it's perfectly reasonable to have a 1-shard collection with N replicas handled by SolrCloud. Best, Erick On Wed, Apr 26, 2017 at 4:00 PM, Jakov Sosic <jso...@gmail.com> wrote: > Hi guys, > > I was wondering does the introduction of shards actually increase CPU usage? > > I have a 30GB index split into two shards (15GB each), and by analyzing the > logs, I figured out that ~80% of the queries have the > "&shard.url=http://10.3.4.12:8080/solr/mycore/|http://10.3.4.14:8080/solr/mycore/". > > I basically don't need sharding, and am now starting to wonder if shards are > actually increasing the CPU usage of my nodes or not, cause of the huge > percentage of queries with "shard.url=" flag? > > I'm fighting with high cpu usage, and if turning sharding of and just > keeping the replicas in my collection would lower the CPU usage for more > then 10% I would choose that path.. > > > Any insights? > > Thanks. >