Hmmm, are you sure SolrCloud fits your needs? You say that you think everything will fit on one shard and are worried about bulk updates. In that case I should think regular Solr master/slave (rather than cloud) might be a better fit. Using Cloud and all that goes with it for a single shard is certainly possible, but I question whether it's your best option here....
Of course if NRT is a requirement, then SolrCloud is a much better option.... With typical master/slave setups, since your bulk updates are happening on a separate machine, having multiple slaves that query at a given interval seems like it would work, but you'd have to be able to stand, say, 5-10 minute latency... Best Erick On Wed, Jun 13, 2012 at 7:47 AM, <lenz...@gfi.ihk.de> wrote: > Mark Miller <markrmil...@gmail.com> schrieb am 12.06.2012 19:19:01: >> >> >> On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote: >> >> > Hello, >> > >> > we tested SolrCloud in a setup with one collection, two shards and one > >> > replica per shard and it works quite fine with some example data. >> > Now, we plan to set up our own collection and determine in how many > shards >> > we should devide it. >> > We can estimate quite exactly the size of the collection, but we don't > >> > know, what the best approach for sharding is, >> > even if we know the size and the amount of queries and updates. >> > Is there any documentation or a kind of design guidelines for sharding > a >> > collection in SolrCloud? >> > >> > >> > Thanks & regards, >> > Norman Lenzner >> >> >> It's hard to tell - I think you want to start with an idea of how >> many docs you can fit on a single node. This can vary wildly >> depending on many factors. Generally you have to do some testing >> with your particular config and data. You can search the mailing >> lists and perhaps dig up a little info, but there is really no >> replacement for running some tests with real data. >> >> Then you have to plan in your growth rate - resharding is naturally >> a relatively expensive operation. Once you have an idea of how many >> docs per machine you think seems comfortable, figure out how >> machines you need given your estimated doc growth rate and perhaps >> some padding. You might not get it right, but if you expect the >> possibility of a lot of growth, erring on the more shards side is >> obviously better. >> >> - Mark Miller >> lucidimagination.com >> > > Hello and thanks for your reply, > > We will run some tests to determine the size of our collection, but I > think, there > won't be the need of a second shard at all. The problem is not the size or > the growth of > the docs, but there will be a quite high update frequency. So, if we have > many bulk updates, is > it reasonable to distribute the update load on multiple shards? > > Thanks & regards, > Norman Lenzner