At this data size, don't worry at _all_ about duplicating content. A
single Solr node easily holds 20M docs. 50M is common and 250M is not
unheard of.

My bold claim is: you can freely duplicate the data to your heart's
content and you'll never notice it.

In fact, you can put it all in a single collection with some kind of
"site" field to distinguish which is which
and when you want to restrict results to a specific site, just use an fq clause.

HTH,
Erick

On Wed, Apr 15, 2015 at 1:40 PM, vsriram30 <vsrira...@gmail.com> wrote:
> Hi All,
>
> Consider this scenario : I am having around 100K content and I want to
> launch 5 sites with that content. For example, around 50K content for site1,
> 40K content for site2, 30K for site3, 20K for site4, and 10K for site5.
>
> As seen from this example, these sites have few overlapping content and non
> overlapping content as well. In this case say if a content page is present
> in all site1, site2 and site3 out of 50 fields per content page, say 30
> fields remain common between site1 and site2, 25 fields common between site1
> and site3 and 20 fields between site2 and site3, in this case, my aim is to
> prevent duplication as much as possible without getting too much reduction
> in QPS. Hence I consider the following options,
>
> Option 1: Just maintain individual copy of duplicated content for each site
> and overwrite site specific information while indexing for those sites.
> Pros:
> Better QPS as no query time joins are involved.
> Cons:
> Duplication of common fields for common content across sites.
>
> Option 2: Maintain just a single copy of common fields per content across
> all overlapping sites and separate site specific information for that
> content and do a merge while serving using joins.
> In this approach, for joins I looked at Block join provided by solr and
> looks like it may not be a good fit for my case as if one site specific info
> changes, I don't want to index the entire block containing other sites as
> well.
> Is there any better way to tackle this making sure we are not occupying so
> much space and at the same time not reducing the QPS too much?
>
> Thanks,
> Sriram
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-index-design-for-this-use-case-tp4200000.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to