On 1/4/13 9:21 AM, Hassan wrote:
Hi,

I am considering SolrCloud for our applications but I have run into the limitation of not being able to use Join Queries in distributed searches.
Our requirements are the following:
- SolrCloud will serve many applications where each application "index" is separate from other application. Each application really is customer deployment and we need to isolate customers data from each other -Join queries are required. Queries will only look at one customer at a time. - Since data volume for each customer is small in Solr/Lucene standards, (1-2 Million document is small, right?
Yes
), we are really interested in the replication aspect of SolrCloud more than distributed search.

I am considering the following SolrCloud design with questions:
- Start SolrCloud with 1 shard only. This should allow join queries to work correctly since all documents will be available in the same shard (index). is this a correct assumption?
- Each customer will have its own collection in the SolrCloud.
You cant have only one shard and several collections. A collections consists of a number of shards, but a shards "belong" to a collection, so two different collections do not use the same shard. Shard is "below" collection in the concept-hierarchy so to speak.
Do collections provide me with data isolation between customers?
Yes?
Depends on what you mean with "isolation". Since different collections enforce different shards, and each shard basically has its own lucene index (set of lucene indices if you use replication), and distinct lucene indices typically persist in different disk-folders, you will get "isolation" of data in the way that data for different customers will be stored in different disk-folders.
- Adding more nodes as replicas of the single shard to achieve replication and fault tolerance.

Thank you,
Hs
Not sure I understand completely what you want to achieve, but you might want to have a collection per customer. One shard per collection = one shard per customer = (as long as we do not consider replication) one lucene index per customer = one data-disk-folder per customer. You should be able to do join queries inside the specific customers shard.

Regards, Per Steffensen

Reply via email to