Hi, I think things will work for Hassan as he described them. The key is not to shard in his case, that's all.
Hassan, yes, 1-2M docs is small. But beware of creating a crazy number (e.g. thousands) of collections per server, as each collection has some cost. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Jan 4, 2013 at 5:28 AM, Per Steffensen <st...@designware.dk> wrote: > On 1/4/13 9:21 AM, Hassan wrote: > >> Hi, >> >> I am considering SolrCloud for our applications but I have run into the >> limitation of not being able to use Join Queries in distributed searches. >> Our requirements are the following: >> - SolrCloud will serve many applications where each application "index" >> is separate from other application. Each application really is customer >> deployment and we need to isolate customers data from each other >> -Join queries are required. Queries will only look at one customer at a >> time. >> - Since data volume for each customer is small in Solr/Lucene standards, >> (1-2 Million document is small, right? >> > Yes > > ), we are really interested in the replication aspect of SolrCloud more >> than distributed search. >> >> I am considering the following SolrCloud design with questions: >> - Start SolrCloud with 1 shard only. This should allow join queries to >> work correctly since all documents will be available in the same shard >> (index). is this a correct assumption? >> - Each customer will have its own collection in the SolrCloud. >> > You cant have only one shard and several collections. A collections > consists of a number of shards, but a shards "belong" to a collection, so > two different collections do not use the same shard. Shard is "below" > collection in the concept-hierarchy so to speak. > > Do collections provide me with data isolation between customers? >> > Yes? > Depends on what you mean with "isolation". Since different collections > enforce different shards, and each shard basically has its own lucene index > (set of lucene indices if you use replication), and distinct lucene indices > typically persist in different disk-folders, you will get "isolation" of > data in the way that data for different customers will be stored in > different disk-folders. > > - Adding more nodes as replicas of the single shard to achieve >> replication and fault tolerance. >> >> Thank you, >> Hs >> > Not sure I understand completely what you want to achieve, but you might > want to have a collection per customer. One shard per collection = one > shard per customer = (as long as we do not consider replication) one lucene > index per customer = one data-disk-folder per customer. You should be able > to do join queries inside the specific customers shard. > > Regards, Per Steffensen > >