Re: How to select the correct number of Shards in SolrCloud

Daniel Collins Fri, 16 Jan 2015 00:58:38 -0800

Sharding a query lets you parallel the actual querying the index part of
the search. But remember that as soon as you spread the query out more, you
also need to bring all 64 results sets back together and consolidate them
into a single result set for the end user.  At some point, the gain of
being able to search the data quicker is outweighed by the cost of this
consolidation activity.

One other point to mention, which we noticed as a by-product of some
large-scale sharding we were testing (256 shards, no caches, whole
different kettle of fish!)

The resulting query is only as fast as the slowest shard.  If you have 64
shards, and 8 shards/cores per machine, how many JVMs are you running per
machine?  If you have a single JVM with 8 cores in it, then remember as
soon as that JVM enters a GC cycle, all those 8 cores will stall
processing.  If you have a query and it needs to get results from 64 cores,
if 63 return in 100ms but the last core is in GC pause and takes 500ms,
your query will take just over 500ms.

With respect to sharding, I would never start with a large number of shards
(and 64 is reasonably large in Solr terms). You might be able to get away
without sharding at all, if that meets your latency requirements, then why
bother with the complexity of sharding?  Use those extra CPUs for
processing more QPS instead of a single query faster?

Lastly, you mentioned you allocated 32Gb to "solr", do you mean to the JVM
heap?  That's quite a lot of a 64Gb machine, you haven't left much for the
page cache.  The general rule for Solr is to make the JVM heap as small as
you can get away with, to let the OS page cache (which is needed to cache
all the index files) with as much memory as possible.

On 16 January 2015 at 05:58, Manohar Sripada <manohar...@gmail.com> wrote:

> Hi All,
>
> My Setup is as follows. There are 16 nodes in my SolrCloud and 4 CPU cores
> on each Solr Node VM. Each having 64 GB of RAM, out of which I have
> allocated 32 GB to Solr. I have a collection which contains around 100
> million Docs, which I created with 64 shards, replication factor 2, and 8
> shards per node. Each shard is getting around 1.6 Million Documents.
>
> The reason I have created 64 Shards is there are 4 CPU cores on each VM;
> while querying I can make use of all the CPU cores. On an average, Solr
> QTime is around 500ms here.
>
> Last time to my other discussion, Erick suggested that I might be over
> sharding, So, I tried reducing the number of shards to 32 and then 16. To
> my surprise, it started performing better. It came down to 300 ms (for 32
> shards) and 100 ms (for 16 shards). I haven't tested with filters and
> facets yet here. But, the simple search queries had shown lot of
> improvement.
>
> So, how come the less number of shards performing better?? Is it because
> there are less number of posting lists to search on OR less merges that are
> happening? And how to determine the correct number of shards?
>
> Thanks,
> Manohar
>

Re: How to select the correct number of Shards in SolrCloud

Reply via email to