On 01/06/2015 07:54 PM, Erick Erickson wrote:
Have you considered pre-supposing SolrCloud and using the SPLITSHARD
API command?
I think that's the direction we'll probably be going. Index size (at
least for us) can be unpredictable in some cases. Some clients start out
small and then grow exponentially, while others start big and then don't
grow much at all. Starting with SolrCloud would at least give us that
flexibility.
That being said, SPLITSHARD doesn't seem ideal. If a shard reaches a
certain size, it would be better for us to simply add an extra shard,
without splitting.
On Tue, Jan 6, 2015 at 10:33 AM, Peter Sturge <peter.stu...@gmail.com> wrote:
++1 for the automagic shard creator. We've been looking into doing this
sort of thing internally - i.e. when a shard reaches a certain size/num
docs, it creates 'sub-shards' to which new commits are sent and queries to
the 'parent' shard are included. The concept works, as long as you don't
try any non-dist stuff - it's one reason why all our fields are always
single valued.
Is there a problem with multi-valued fields and distributed queries?
A cool side-effect of sub-sharding (for lack of a snappy term) is that the
parent shard then stops suffering from auto-warming latency due to commits
(we do a fair amount of committing). In theory, you could carry on
sub-sharding until your hardware starts gasping for air.
Sounds like you're doing something similar to us. In some cases we have
a hard commit every minute. Keeping the caches hot seems like a very
good reason to send data to a specific shard. At least I'm assuming that
when you add documents to a single shard and commit; the other shards
won't be impacted...
- Bram