On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote:

> Hello,
> 
> we tested SolrCloud in a setup with one collection, two shards and one 
> replica per shard and it works quite fine with some example data. 
> Now, we plan to set up our own collection and determine in how many shards 
> we should devide it. 
> We can estimate quite exactly the size of the collection, but we don't 
> know, what the best approach for sharding is, 
> even if we know the size and the amount of queries and updates.
> Is there any documentation or a kind of design guidelines for sharding a 
> collection in SolrCloud?
> 
> 
> Thanks & regards,
> Norman Lenzner


It's hard to tell - I think you want to start with an idea of how many docs you 
can fit on a single node. This can vary wildly depending on many factors. 
Generally you have to do some testing with your particular config and data. You 
can search the mailing lists and perhaps dig up a little info, but there is 
really no replacement for running some tests with real data.

Then you have to plan in your growth rate - resharding is naturally a 
relatively expensive operation. Once you have an idea of how many docs per 
machine you think seems comfortable, figure out how machines you need given 
your estimated doc growth rate and perhaps some padding. You might not get it 
right, but if you expect the possibility of a lot of growth, erring on the more 
shards side is obviously better.

- Mark Miller
lucidimagination.com











Reply via email to