Hi Matt,

You could create extra shards up front, but if your queries are fanned out
to all of them, you can run into situations where there are too many
concurrent queries per node causing lots of content switching and
ultimately being less efficient than if you had fewer shards.  So while
this is an approach to take, I'd personally first try to run tests to see
how much a single node can handle in terms of volume, expected query rates,
and target latency, and then use monitoring/alerting/whatever-helps tools
to keep an eye on the cluster so that when you start approaching the target
limits you are ready with additional nodes and shard splitting if needed.

Of course, if your data and queries are such that newer documents are
queries more, you should look into time-based collections... and if your
queries can only query a subset of data you should look into query routing.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper <matt.kui...@issinc.com> wrote:

> I am starting a new project and one of the requirements is that Solr must
> scale to handle increasing load (both search performance and index size).
>
> My understanding is that one way to address search performance is by
> adding more replicas.
>
> I am more concerned about handling a growing index size.  I have already
> been given some good input on this topic and am considering a shard
> splitting approach, but am more focused on a rebalancing approach that
> includes defining many shards up front and then moving these existing
> shards on to new Solr servers as needed.  Plan to experiment with this
> approach first.
>
> Before I got too deep, I wondered if anyone has any tips or warnings on
> these approaches, or has scaled Solr in a different manner.
>
> Thanks,
> Matt
>

Reply via email to