Hi All,

Consider this scenario : I am having around 100K content and I want to
launch 5 sites with that content. For example, around 50K content for site1,
40K content for site2, 30K for site3, 20K for site4, and 10K for site5.

As seen from this example, these sites have few overlapping content and non
overlapping content as well. In this case say if a content page is present
in all site1, site2 and site3 out of 50 fields per content page, say 30
fields remain common between site1 and site2, 25 fields common between site1
and site3 and 20 fields between site2 and site3, in this case, my aim is to
prevent duplication as much as possible without getting too much reduction
in QPS. Hence I consider the following options,

Option 1: Just maintain individual copy of duplicated content for each site
and overwrite site specific information while indexing for those sites.
Pros:
Better QPS as no query time joins are involved.
Cons:
Duplication of common fields for common content across sites.

Option 2: Maintain just a single copy of common fields per content across
all overlapping sites and separate site specific information for that
content and do a merge while serving using joins.
In this approach, for joins I looked at Block join provided by solr and
looks like it may not be a good fit for my case as if one site specific info
changes, I don't want to index the entire block containing other sites as
well.
Is there any better way to tackle this making sure we are not occupying so
much space and at the same time not reducing the QPS too much?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-index-design-for-this-use-case-tp4200000.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to