On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote: > Hello, > > we tested SolrCloud in a setup with one collection, two shards and one > replica per shard and it works quite fine with some example data. > Now, we plan to set up our own collection and determine in how many shards > we should devide it. > We can estimate quite exactly the size of the collection, but we don't > know, what the best approach for sharding is, > even if we know the size and the amount of queries and updates. > Is there any documentation or a kind of design guidelines for sharding a > collection in SolrCloud? > > > Thanks & regards, > Norman Lenzner
It's hard to tell - I think you want to start with an idea of how many docs you can fit on a single node. This can vary wildly depending on many factors. Generally you have to do some testing with your particular config and data. You can search the mailing lists and perhaps dig up a little info, but there is really no replacement for running some tests with real data. Then you have to plan in your growth rate - resharding is naturally a relatively expensive operation. Once you have an idea of how many docs per machine you think seems comfortable, figure out how machines you need given your estimated doc growth rate and perhaps some padding. You might not get it right, but if you expect the possibility of a lot of growth, erring on the more shards side is obviously better. - Mark Miller lucidimagination.com