As a general rule, there are only two ways that Solr scales to large numbers: large number of documents and moderate number of nodes (shards and replicas). All other parameters should be kept relatively small, like dozens or low hundreds. Even shards and replicas should probably kept down to that same guidance of dozens or low hundreds.
Tens of millions of documents should be no problem. I recommend 100 million as the rough limit of documents per node. Of course it all depends on your particular data model and data and hardware and network, so that number could be smaller or larger. The main guidance has always been to simply do a proof of concept implementation to test for your particular data model and data values. -- Jack Krupansky On Sun, Jun 14, 2015 at 7:31 AM, Arnon Yogev <arn...@il.ibm.com> wrote: > We're running some tests on Solr and would like to have a deeper > understanding of its limitations. > > Specifically, We have tens of millions of documents (say 50M) and are > comparing several "#collections X #docs_per_collection" configurations. > For example, we could have a single collection with 50M docs or 5000 > collections with 10K docs each. > When trying to create the 5000 collections, we start getting frequent > errors after 1000-1500 collections have been created. Feels like some > limit has been reached. > These tests are done on a single node + an additional node for replica. > > Can someone elaborate on what could limit Solr to a high number of > collections (if at all)? > i.e. if we wanted to have 5K or 10K (or 100K) collections, is there > anything in Solr that can prevent it? Where would it break? > > Thanks, > Arnon