Hmmm, are you sure SolrCloud fits your needs? You say that you think
everything will fit on one shard and are worried about bulk updates. In
that case I should think regular Solr master/slave (rather than cloud)
might be a better fit. Using Cloud and all that goes with it for a single shard
is certainly possible, but I question whether it's your best option here....

Of course if NRT is a requirement, then SolrCloud is a much better option....

With typical master/slave setups, since your bulk updates are happening on
a separate machine, having multiple slaves that query at a given interval
seems like it would work, but you'd have to be able to stand, say, 5-10 minute
latency...

Best
Erick

On Wed, Jun 13, 2012 at 7:47 AM,  <lenz...@gfi.ihk.de> wrote:
> Mark Miller <markrmil...@gmail.com> schrieb am 12.06.2012 19:19:01:
>>
>>
>> On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote:
>>
>> > Hello,
>> >
>> > we tested SolrCloud in a setup with one collection, two shards and one
>
>> > replica per shard and it works quite fine with some example data.
>> > Now, we plan to set up our own collection and determine in how many
> shards
>> > we should devide it.
>> > We can estimate quite exactly the size of the collection, but we don't
>
>> > know, what the best approach for sharding is,
>> > even if we know the size and the amount of queries and updates.
>> > Is there any documentation or a kind of design guidelines for sharding
> a
>> > collection in SolrCloud?
>> >
>> >
>> > Thanks & regards,
>> > Norman Lenzner
>>
>>
>> It's hard to tell - I think you want to start with an idea of how
>> many docs you can fit on a single node. This can vary wildly
>> depending on many factors. Generally you have to do some testing
>> with your particular config and data. You can search the mailing
>> lists and perhaps dig up a little info, but there is really no
>> replacement for running some tests with real data.
>>
>> Then you have to plan in your growth rate - resharding is naturally
>> a relatively expensive operation. Once you have an idea of how many
>> docs per machine you think seems comfortable, figure out how
>> machines you need given your estimated doc growth rate and perhaps
>> some padding. You might not get it right, but if you expect the
>> possibility of a lot of growth, erring on the more shards side is
>> obviously better.
>>
>> - Mark Miller
>> lucidimagination.com
>>
>
> Hello and thanks for your reply,
>
> We will run some tests to determine the size of our collection, but I
> think, there
> won't be the need of a second shard at all. The problem is not the size or
> the growth of
> the docs, but there will be a quite high update frequency. So, if we have
> many bulk updates, is
> it reasonable to distribute the update load on multiple shards?
>
> Thanks & regards,
> Norman Lenzner

Reply via email to