HI Chamil,
One thing to consider is relevancy, especially in case tenants' domains
are different (e.g. one is tech and other pharmacy). If you go with one
collection and use same field (e.g. desc) for all tenants, you will get
one field stats and could skew results ordering if you order by score
(e.g. word 'cream' might be infrequent in tech tenant but could become
frequent overall because of large pharmacy tenant).
On the other side having large number of collection could also be
problematic. You can address that issue with splitting tenants to
multiple clusters, or having collections for large tenants and grouping
smaller tenants by domain.
Make sure that you use routing by tenant id in case of multi tenant
collection.
HTH,
Emir
On 28.08.2016 07:02, Chamil Jeewantha wrote:
Thank you everyone for your great support.
I will update you with our final approach.
Best regards,
Chamil
On Aug 28, 2016 01:34, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
In my own work, the risk to the business if every single client cannot
access search is so great, we would never consider putting everything in
one. You should certainly ask that question of the business stakeholders
before you decide.
For that reason, I might recommend that each of the multiple collections
suggested above by Erick could also be on a separate SolrCloud (or single
Solr instance) so that no single failure can ever take down every tenant's
ability to search -- only those on that particular SolrCloud...
On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson <erickerick...@gmail.com>
wrote:
There's no one right answer here. I've also seen a hybrid approach
where there are multiple collections each of which has some
number of tenants resident. Eventually, you need to think of some
kind of partitioning, my rough number of documents for a single core
is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).
All that said, you may also be interested in the "transient cores"
option, see: https://cwiki.apache.org/confluence/display/solr/
Defining+core.properties
and the transient and transientCacheSize (this latter in solr.xml). Note
that this is stand-alone only so you can't move that concept to
SolrCloud if you eventually go there.
Best,
Erick
On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha <kdcha...@gmail.com>
wrote:
Dear Solr Members,
We are using SolrCloud as the search provider of a multi-tenant cloud
based
application. We have one schema for all the tenants. The indexes will
have
large number(millions) of documents.
As of our research, we have two options,
- One large collection for all the tenants and use Composite-ID
routing
- Collection per tenant
The below mail says,
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
201403.mbox/%3c5324cd4b.2020...@protulae.com%3E
SolrCloud is *more scalable in terms of index size*. Plus you get
redundancy which can't be underestimated in a hosted solution.
AND
The issue is management. 1000s of cores/collections require a level of
automation. On the other hand, having a single core/collection means if
you make one change to the schema or solrconfig, it affects everyone.
Based on the above facts we think One large collection will be the way
to
go.
Questions:
1. Is that the right way to go?
2. Will it be a hassle when we need to do reindexing?
3. What is the chance of entire collection crash? (in that case all
tenants will be affected and reindexing will be painful.
Thank you in advance for your kind opinion.
Best Regards,
Chamil
--
http://kavimalla.blgospot.com
http://kdchamil.blogspot.com
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/