Re: SolrCloud and Join Queries

Hassan Sat, 05 Jan 2013 05:32:22 -0800

Thanks Per and Otis,

It is much clearer now but I have a question about adding new solr nodesand collections.I have a dedicated zookeeper instance. Lets say I have uploaded myconfiguration to zookeeper using "zkcli" and named it, say,"configuration1".Now I want to create a new solrcloud from scratch with two solr nodes. Ineed to create a new collection (with one shard) called "customer1"using the configuration name "configuration1". I have tried differentways using Collections API, zkcli linkconfig/downconfig but I cannot getit to work. Collection is only available on one node. The example"collection1" works as expected where one node has the Leader shard andthe other node has the replica. See the cloud treehttp://imageshack.us/f/706/selection008p.png/

What is the correct way to dynamically add collections to alreadyexisting nodes and new nodes?


Thanks you,
Hs

On 05/01/13 09:07, Otis Gospodnetic wrote:

Hi,

I think things will work for Hassan as he described them.  The key is not
to shard in his case, that's all.

Hassan, yes, 1-2M docs is small. But beware of creating a crazy
number (e.g. thousands) of collections per server, as each collection has
some cost.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Jan 4, 2013 at 5:28 AM, Per Steffensen <st...@designware.dk> wrote:

On 1/4/13 9:21 AM, Hassan wrote:

Hi,

I am considering SolrCloud for our applications but I have run into the
limitation of not being able to use Join Queries in distributed searches.
Our requirements are the following:
- SolrCloud will serve many applications where each application "index"
is separate from other application. Each application really is customer
deployment and we need to isolate customers data from each other
-Join queries are required. Queries will only look at one customer at a
time.
- Since data volume for each customer is small in Solr/Lucene standards,
(1-2 Million document is small, right?

Yes

  ), we are really interested in the replication aspect of SolrCloud more

than distributed search.

I am considering the following SolrCloud design with questions:
- Start SolrCloud with 1 shard only. This should allow join queries to
work correctly since all documents will be available in the same shard
(index). is this a correct assumption?
- Each customer will have its own collection in the SolrCloud.

You cant have only one shard and several collections. A collections
consists of a number of shards, but a shards "belong" to a collection, so
two different collections do not use the same shard. Shard is "below"
collection in the concept-hierarchy so to speak.

  Do collections provide me with data isolation between customers?
Yes?
Depends on what you mean with "isolation". Since different collections
enforce different shards, and each shard basically has its own lucene index
(set of lucene indices if you use replication), and distinct lucene indices
typically persist in different disk-folders, you will get "isolation" of
data in the way that data for different customers will be stored in
different disk-folders.

  - Adding more nodes as replicas of the single shard to achieve

replication and fault tolerance.

Thank you,
Hs

Not sure I understand completely what you want to achieve, but you might
want to have a collection per customer. One shard per collection = one
shard per customer = (as long as we do not consider replication) one lucene
index per customer = one data-disk-folder per customer. You should be able
to do join queries inside the specific customers shard.

Regards, Per Steffensen

Re: SolrCloud and Join Queries

Reply via email to