Re: SolrCloud and Join Queries

Otis Gospodnetic Fri, 04 Jan 2013 21:07:32 -0800

Hi,

I think things will work for Hassan as he described them.  The key is not
to shard in his case, that's all.


Hassan, yes, 1-2M docs is small. But beware of creating a crazy
number (e.g. thousands) of collections per server, as each collection has
some cost.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Jan 4, 2013 at 5:28 AM, Per Steffensen <st...@designware.dk> wrote:

> On 1/4/13 9:21 AM, Hassan wrote:
>
>> Hi,
>>
>> I am considering SolrCloud for our applications but I have run into the
>> limitation of not being able to use Join Queries in distributed searches.
>> Our requirements are the following:
>> - SolrCloud will serve many applications where each application "index"
>> is separate from other application. Each application really is customer
>> deployment and we need to isolate customers data from each other
>> -Join queries are required. Queries will only look at one customer at a
>> time.
>> - Since data volume for each customer is small in Solr/Lucene standards,
>> (1-2 Million document is small, right?
>>
> Yes
>
>  ), we are really interested in the replication aspect of SolrCloud more
>> than distributed search.
>>
>> I am considering the following SolrCloud design with questions:
>> - Start SolrCloud with 1 shard only. This should allow join queries to
>> work correctly since all documents will be available in the same shard
>> (index). is this a correct assumption?
>> - Each customer will have its own collection in the SolrCloud.
>>
> You cant have only one shard and several collections. A collections
> consists of a number of shards, but a shards "belong" to a collection, so
> two different collections do not use the same shard. Shard is "below"
> collection in the concept-hierarchy so to speak.
>
>  Do collections provide me with data isolation between customers?
>>
> Yes?
> Depends on what you mean with "isolation". Since different collections
> enforce different shards, and each shard basically has its own lucene index
> (set of lucene indices if you use replication), and distinct lucene indices
> typically persist in different disk-folders, you will get "isolation" of
> data in the way that data for different customers will be stored in
> different disk-folders.
>
>  - Adding more nodes as replicas of the single shard to achieve
>> replication and fault tolerance.
>>
>> Thank you,
>> Hs
>>
> Not sure I understand completely what you want to achieve, but you might
> want to have a collection per customer. One shard per collection = one
> shard per customer = (as long as we do not consider replication) one lucene
> index per customer = one data-disk-folder per customer. You should be able
> to do join queries inside the specific customers shard.
>
> Regards, Per Steffensen
>
>

Re: SolrCloud and Join Queries

Reply via email to