Re: How large is your solr index?

Bram Van Dam Wed, 07 Jan 2015 01:26:42 -0800

On 01/06/2015 07:54 PM, Erick Erickson wrote:

Have you considered pre-supposing SolrCloud and using the SPLITSHARD
API command?

I think that's the direction we'll probably be going. Index size (atleast for us) can be unpredictable in some cases. Some clients start outsmall and then grow exponentially, while others start big and then don'tgrow much at all. Starting with SolrCloud would at least give us thatflexibility.

That being said, SPLITSHARD doesn't seem ideal. If a shard reaches acertain size, it would be better for us to simply add an extra shard,without splitting.

On Tue, Jan 6, 2015 at 10:33 AM, Peter Sturge <peter.stu...@gmail.com> wrote:

++1 for the automagic shard creator. We've been looking into doing this
sort of thing internally - i.e. when a shard reaches a certain size/num
docs, it creates 'sub-shards' to which new commits are sent and queries to
the 'parent' shard are included. The concept works, as long as you don't
try any non-dist stuff - it's one reason why all our fields are always
single valued.


Is there a problem with multi-valued fields and distributed queries?

A cool side-effect of sub-sharding (for lack of a snappy term) is that the
parent shard then stops suffering from auto-warming latency due to commits
(we do a fair amount of committing). In theory, you could carry on
sub-sharding until your hardware starts gasping for air.

Sounds like you're doing something similar to us. In some cases we havea hard commit every minute. Keeping the caches hot seems like a verygood reason to send data to a specific shard. At least I'm assuming thatwhen you add documents to a single shard and commit; the other shardswon't be impacted...


 - Bram

Re: How large is your solr index?

Reply via email to