Going with single cluster having multiple collections (for each client) is what I would try. How many clients do you have? If 10K, mean 10K collections and then how many documents, their size etc. you will need to come up with to nail down #machines and their memory/cpu requirements. Going with single collection is not really a multi-tenant setup and also when you have different schema's.
Thanks, Susheel On Tue, Jun 13, 2017 at 12:35 PM, Zisis T. <zist...@runbox.com> wrote: > I'm trying to setup a multi-tenant Solr cluster (v6.5.1) which must meet > the > following requirements. The tenants are different customers with similar > type of data. > > * Ability to query per client but also across all clients > * Don't want to hit all shards for all type of requests (per client, across > clients) > * Don't want to have everything under a single multi-sharded collection to > avoid a SPOF and maintenance headaches > (e.g. a schema change will force an all-client reindexing. single huge > backup/restore) > * Ability to semi-support different schemas. > > Based on the above I ruled out the following setups > * Single multi-sharded collection for all clients and all its variations > (e.g. multiple clients in a singe shard) > * One collection per client > > My preference lies in a setup like the following > * Create a limited # of collections > * Split the clients in the collections created above based on some criteria > (size, content-type) > * Client specific requests will be limited in a single collection > * Across clients requests will target a limited # of collections (using > &collection=col_1,col_2,col_3) > > The approach above meets the requirements posted above but the issue that > is > blocking me is the Distributed IDF not working properly across collections. > (Check comment#3, bullet#2 of > http://lucene.472066.n3.nabble.com/Distributed-IDF-in- > inter-collections-distributed-queries-td4317519.html) > > > -> Do you see anything wrong with my assumptions/approach above? Are there > any alternatives besides having separate clusters for the search across > clients and the individual clients? > -> Is it safe to go with a single collection? If it is, I still need to > handle the possible different schemas per client somehow. > -> Is there a way to enforce local stats when quering a single collection > and use global stats only when querying across collections? (see link > above) > > Thanks > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Multi-tenant-setup-tp4340377.html > Sent from the Solr - User mailing list archive at Nabble.com. >