In my own work, the risk to the business if every single client cannot access search is so great, we would never consider putting everything in one. You should certainly ask that question of the business stakeholders before you decide.
For that reason, I might recommend that each of the multiple collections suggested above by Erick could also be on a separate SolrCloud (or single Solr instance) so that no single failure can ever take down every tenant's ability to search -- only those on that particular SolrCloud... On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson <erickerick...@gmail.com> wrote: > There's no one right answer here. I've also seen a hybrid approach > where there are multiple collections each of which has some > number of tenants resident. Eventually, you need to think of some > kind of partitioning, my rough number of documents for a single core > is 50M (NOTE: I've seen between 10M and 300M docs fit in a core). > > All that said, you may also be interested in the "transient cores" > option, see: https://cwiki.apache.org/confluence/display/solr/ > Defining+core.properties > and the transient and transientCacheSize (this latter in solr.xml). Note > that this is stand-alone only so you can't move that concept to > SolrCloud if you eventually go there. > > Best, > Erick > > On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha <kdcha...@gmail.com> > wrote: > > Dear Solr Members, > > > > We are using SolrCloud as the search provider of a multi-tenant cloud > based > > application. We have one schema for all the tenants. The indexes will > have > > large number(millions) of documents. > > > > As of our research, we have two options, > > > > - One large collection for all the tenants and use Composite-ID > routing > > - Collection per tenant > > > > The below mail says, > > > > > > https://mail-archives.apache.org/mod_mbox/lucene-solr-user/ > 201403.mbox/%3c5324cd4b.2020...@protulae.com%3E > > > > SolrCloud is *more scalable in terms of index size*. Plus you get > > redundancy which can't be underestimated in a hosted solution. > > > > > > AND > > > > The issue is management. 1000s of cores/collections require a level of > > automation. On the other hand, having a single core/collection means if > > you make one change to the schema or solrconfig, it affects everyone. > > > > > > Based on the above facts we think One large collection will be the way to > > go. > > > > Questions: > > > > 1. Is that the right way to go? > > 2. Will it be a hassle when we need to do reindexing? > > 3. What is the chance of entire collection crash? (in that case all > > tenants will be affected and reindexing will be painful. > > > > Thank you in advance for your kind opinion. > > > > Best Regards, > > Chamil > > > > -- > > http://kavimalla.blgospot.com > > http://kdchamil.blogspot.com >