As a general rule, there are only two ways that Solr scales to large
numbers: large number of documents and moderate number of nodes (shards and
replicas). All other parameters should be kept relatively small, like
dozens or low hundreds. Even shards and replicas should probably kept down
to that same guidance of dozens or low hundreds.

Tens of millions of documents should be no problem. I recommend 100 million
as the rough limit of documents per node. Of course it all depends on your
particular data model and data and hardware and network, so that number
could be smaller or larger.

The main guidance has always been to simply do a proof of concept
implementation to test for your particular data model and data values.

-- Jack Krupansky

On Sun, Jun 14, 2015 at 7:31 AM, Arnon Yogev <arn...@il.ibm.com> wrote:

> We're running some tests on Solr and would like to have a deeper
> understanding of its limitations.
>
> Specifically, We have tens of millions of documents (say 50M) and are
> comparing several "#collections X #docs_per_collection" configurations.
> For example, we could have a single collection with 50M docs or 5000
> collections with 10K docs each.
> When trying to create the 5000 collections, we start getting frequent
> errors after 1000-1500 collections have been created. Feels like some
> limit has been reached.
> These tests are done on a single node + an additional node for replica.
>
> Can someone elaborate on what could limit Solr to a high number of
> collections (if at all)?
> i.e. if we wanted to have 5K or 10K (or 100K) collections, is there
> anything in Solr that can prevent it? Where would it break?
>
> Thanks,
> Arnon

Reply via email to