On 11/13/2017 12:33 PM, Shamik Bandopadhyay wrote:
>     I'm looking for some input on design considerations for defining
> collections in a SolrCloud cluster. Right now, our cluster consists of two
> collections in a 2 shard / 2 replica mode. Each collection has a dedicated
> set of source and don't overlap, which made it an easy decision.
> Recently, we've a requirement to index a bunch of new sources that are
> region based. The search result corresponding to those region needs to come
> from their specific source as well sources from one of our existing
> collection. Here's an example of our existing collection and their
> corresponding source(s).

You haven't defined in *ANY* way exactly what a "source" is or how that
data actually gets into Solr.  Without that information, it'll be
difficult to even understand your requirements.

If I make one assumption that for all of the data sources, the config
and schema are going to be identical, then I can give you this information:

If you set up each source as a collection in your SolrCloud, you can
create collection aliases that let you query multiple collections with
one query.  Whether or not this will work correctly will depend on a few
factors, but most of all whether or not all the data is using the same
(or extremely similar) Solr config/schema.

> The other consideration is the hardware design. Right now, both shards and
> their replicas run on their dedicated instance. With two collections, we
> sometimes run into OOM scenarios, so I'm a little bit worried about adding
> more collections. Does the best practice (I know it's subjective) in
> scenarios like this call for a dedicated Solr cluster per collection? From
> index size perspective, Source_C,Source_D and Source_E combines close to10
> million documents with 60gb volume size. Each geo based source is small,
> won't exceed more than 500k documents.

10 million documents producing 60GB of index data means that the
documents are relatively large, but aren't super huge -- or that the
data in them is duplicated several times.  For contrast, I have an index
where each shard has about 30 million docs, and each of those shards is
36GB in size.  The entire index has six of these large shards and one
tiny hot shard.

I always get a little anxious when somebody wants best practice
information about Solr configurations and hardware.  Any recommendation
that we make will be COMPLETELY wrong for some use cases, indexes,
and/or query patterns.  Solr configurations and hardware must be
tailored specifically for the use case, index data, and query patterns
that actually exist.  Typically, this means that you have to actually
set up a full system and try it to make any determinations about how
much hardware you need.

https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Regarding your hardware sizing, the only general advice I can give you
is this:  Good performance usually ends up requiring significantly more
RAM than users plan on.

Thanks,
Shawn

Reply via email to