At this data size, don't worry at _all_ about duplicating content. A single Solr node easily holds 20M docs. 50M is common and 250M is not unheard of.
My bold claim is: you can freely duplicate the data to your heart's content and you'll never notice it. In fact, you can put it all in a single collection with some kind of "site" field to distinguish which is which and when you want to restrict results to a specific site, just use an fq clause. HTH, Erick On Wed, Apr 15, 2015 at 1:40 PM, vsriram30 <vsrira...@gmail.com> wrote: > Hi All, > > Consider this scenario : I am having around 100K content and I want to > launch 5 sites with that content. For example, around 50K content for site1, > 40K content for site2, 30K for site3, 20K for site4, and 10K for site5. > > As seen from this example, these sites have few overlapping content and non > overlapping content as well. In this case say if a content page is present > in all site1, site2 and site3 out of 50 fields per content page, say 30 > fields remain common between site1 and site2, 25 fields common between site1 > and site3 and 20 fields between site2 and site3, in this case, my aim is to > prevent duplication as much as possible without getting too much reduction > in QPS. Hence I consider the following options, > > Option 1: Just maintain individual copy of duplicated content for each site > and overwrite site specific information while indexing for those sites. > Pros: > Better QPS as no query time joins are involved. > Cons: > Duplication of common fields for common content across sites. > > Option 2: Maintain just a single copy of common fields per content across > all overlapping sites and separate site specific information for that > content and do a merge while serving using joins. > In this approach, for joins I looked at Block join provided by solr and > looks like it may not be a good fit for my case as if one site specific info > changes, I don't want to index the entire block containing other sites as > well. > Is there any better way to tackle this making sure we are not occupying so > much space and at the same time not reducing the QPS too much? > > Thanks, > Sriram > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-index-design-for-this-use-case-tp4200000.html > Sent from the Solr - User mailing list archive at Nabble.com.