Lajos,
 
Thanks a lot for your insighful thoughts.
 
-----------------------------
The most important thing you have to lock down is whether there is a need to 
customize the schema/solrconfig for each tenant. If there is, then having 
individual cores per tenant is going to be a stronger argument.
-----------------------------
Yes, this is what I am thinking about, for security, performance, and 
flexibiltiy.
 
---------------------------
Finally, I would (in general) argue for cloud-based implementations to give you 
data redundancy ...
---------------------------
Do you mean using multi-sharding to have multiple replicas of cores 
(corresponding to tenants) across nodes?
 
Shushuai
 


________________________________
From: Lajos <la...@protulae.com>
To: solr-user@lucene.apache.org 
Sent: Saturday, March 15, 2014 5:37 AM
Subject: Re: Best practice to support multi-tenant with Solr


Hi Shushuai,

Just a few thoughts.

I would guess that most people would argue for implementing 
multi-tenancy within your core (via some unique filter ID) or collection 
(via document routing) because of the headache of managing individual 
cores at the scale you are talking about.

There are disadvantages the other way too: having a core/collection 
support multiple tenants does affect scoring, since TF-IDF is calculated 
across the index, and can open up security implications that you have to 
address (i.e. making sure a malicious query cannot get another tenants 
documents).

The most important thing you have to lock down is whether there is a 
need to customize the schema/solrconfig for each tenant. If there is, 
then having individual cores per tenant is going to be a stronger 
argument. If I was to guess, and based on my own multi-tenant 
experience, you'll have some high-end tenants who need their own 
cores/collections, and a larger number that can all share a 
configuration. Its like any kind of hosted solution: the cheapest 
version is one-size-fits-all and involves the minimum of management 
overhead, while the higher end are more expensive and require more 
management.

My own preference is for a blended environment. While the management of 
individual cores/collections is not to be taken lightly, I've done it in 
a variety of hosting situations and it all comes down to smart 
management and the intelligent use of administrative scripts. I've 
developed my own set of tools over the years and they work quite well.

Finally, I would (in general) argue for cloud-based implementations to 
give you data redundancy, but that decision would require more information.

HTH,

Lajos Moczar


theconsultantcto.com
Enterprise Lucene/Solr




On 14/03/2014 23:10, shushuai zhu wrote:
> Hi,
>
> I am looking into Solr 4.7 for best practice of multi-tenancy support. Our 
> use cases require support of thousands of tenants (say 10,000) and the 
> incoming data rate could be more than 10k documents per second. I did some 
> research and found people talked about scaling tenants at all four levels:
>
> Solr Cloud
> Collection
> Shard
> Core
>
> I am listing them plus some quoted comments from the links.
>
> 1) Solr Cloud and Collection
>
> http://find.searchhub.org/document/c7caa34d807a8a1b#c7caa34d807a8a1b
>
> -----------
> Are you trying to do "multi-tenant"? If so, you should be talking
>      "multi-cluster" where you externally manage your "tenants",
>      assigning them to clusters, but keeping tenants per cluster down in
>      the dozens/hundreds, and "archiving" inactive tenants and spinning
>      up (and down) clusters as inactive tenants become active or fall
>      into inactivity. But keeping 1,000 or more tenants active in a
>      single cluster as separate collections is... a no-go.
> -----------
>
> 2) Shard
>
> http://searchhub.org/2013/06/13/solr-cloud-document-routing/
>
> -----------
> Document routing can be used to achieve a more efficient
>      multi-tenant environment. This can be done by making the tenant id
>      the shard key, which would group all documents from the same tenant
>      on the same shard.
> -----------
>
> 3) Core
>
> http://find.searchhub.org/document/4312991db2dd90e9#4312991db2dd90e9
>
> -----------
> Every multitenant situation is going to be different, but at the
>      extreme a single core per tenant is the cleanest and provides the
>      best separation, optimal performance, and supports full tf-idf
>      relevancy of document fields for each tenant.
> -----------
>
> http://find.searchhub.org/document/fc5b734fba135e83#fc5b734fba135e83
>
> -----------
> Well, we try to use Solr to run a multi-tenant index/search
>      service.  We assigns each client a different core with their own
>      config and schema. It would be good for us if we can just let the
>      customer to be able to create cores with their own schema and
>      config.
> -----------
>
> I also saw slides talking about scaling time along Collection: timed
>      collections (slides 50 ~ 58)
>
> http://www.slideshare.net/sematext/solr-for-indexing-and-searching-logs
>
> According to these, I am thinking about the following approach:
>
> In a single Solr Cloud, the multi-tenant support is at Core level
>      (one or more cores per tenant), and for better performance, will
>      create a collection every day. When a tenant grows too big, will
>      migrate it from this Solr cloud to a new Solr Cloud.
>
> Any potential issue with this approach? Is there better approach
>      based on your experience?
>
> A few questions related to proposed approach:
>
> 1) When a core is replicated to multiple nodes via multiple shards,
>      the query submitted against a particular core (tenant) should be
>      executed distributed, right?
> 2) What is the best way to move a core from one Solr Cloud to
>      another?
> 3) If we create one collection per day and want to keep data for
>      three years for example, is it OK to have so many collections? If
>      yes, is it cheap to maintain the collection alias for easy querying?
>
> Thanks.
>
> Shushuai
>

Reply via email to