>From my experience on a high-end sever (256GB memory, 40 core CPU) testing collection numbers with one shard and two replicas, the maximum that would work is 3,000 cores (1,500 collections). I'd recommend much less (perhaps half of that), depending on your startup-time requirements. (Though I have settled on 6,000 collection maximum with some patching. See SOLR-7191). You could create multiple clouds after that, and choose the cloud least used to create your collection.
Regarding memory usage I'd pencil in 6MB overheard (no docs) java heap per collection. On 25 March 2015 at 13:46, Ian Rose <ianr...@fullstory.com> wrote: > First off thanks everyone for the very useful replies thus far. > > Shawn - thanks for the list of items to check. #1 and #2 should be fine > for us and I'll check our ulimit for #3. > > To add a bit of clarification, we are indeed using SolrCloud. Our current > setup is to create a new collection for each customer. For now we allow > SolrCloud to decide for itself where to locate the initial shard(s) but in > time we expect to refine this such that our system will automatically > choose the least loaded nodes according to some metric(s). > > Having more than one business entity controlling the configuration of a > > single (Solr) server is a recipe for disaster. Solr works well if there > is > > an architect for the system. > > > Jack, can you explain a bit what you mean here? It looks like Toke caught > your meaning but I'm afraid it missed me. What do you mean by "business > entity"? Is your concern that with automatic creation of collections they > will be distributed willy-nilly across the cluster, leading to uneven load > across nodes? If it is relevant, the schema and solrconfig are controlled > entirely by me and is the same for all collections. Thus theoretically we > could actually just use one single collection for all of our customers > (adding a 'customer:<whatever>' type fq to all queries) but since we never > need to query across customers it seemed more performant (as well as safer > - less chance of accidentally leaking data across customers) to use > separate collections. > > Better to give each tenant a separate Solr instance that you spin up and > > spin down based on demand. > > > Regarding this, if by tenant you mean "customer", this is not viable for us > from a cost perspective. As I mentioned initially, many of our customers > are very small so dedicating an entire machine to each of them would not be > economical (or efficient). Or perhaps I am not understanding what your > definition of "tenant" is? > > Cheers, > Ian > > > > On Tue, Mar 24, 2015 at 4:51 PM, Toke Eskildsen <t...@statsbiblioteket.dk> > wrote: > > > Jack Krupansky [jack.krupan...@gmail.com] wrote: > > > I'm sure that I am quite unqualified to describe his hypothetical > setup. > > I > > > mean, he's the one using the term multi-tenancy, so it's for him to be > > > clear. > > > > It was my understanding that Ian used them interchangeably, but of course > > Ian it the only one that knows. > > > > > For me, it's a question of who has control over the config and schema > and > > > collection creation. Having more than one business entity controlling > the > > > configuration of a single (Solr) server is a recipe for disaster. > > > > Thank you. Now your post makes a lot more sense. I will not argue against > > that. > > > > - Toke Eskildsen > > > -- Damien Kamerman