Re: Limitation on Collections Number

Erick Erickson Sun, 14 Jun 2015 08:25:49 -0700

To my knowledge there's nothing built in to Solr to limit the number
of collections. There's nothing explicitly in place to handle
many hundreds of collections either so you're really in uncharted,
certainly untested waters. Anecdotally we've heard of the problem
you're describing.


You say you start seeing errors. What are they? OOMs? deadlocks?

If you are _not_ in SolrCloud, then there's the "Lots of cores" solution,
see: http://wiki.apache.org/solr/LotsOfCores. Pay attention to the
warning at the top: NOT FOR SOLRCLOUD!

Also note that the "lots of cores" option really is built for the pattern
where a particular core is searched sporadically. Indexing dropbox
files is a good example. A user may sign on and search her documents
just a few times a day, for a few minutes at a time. Because cores
are loaded/unloaded on demand, supporting
many hundreds of simultaneous users would cause a lot of core
loading/unloading and impact performance.

Best,
Erick

On Sun, Jun 14, 2015 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote:
> Thanks Jack for your response. But I think Arnon's question was different.
>
> If you need to index 10,000 different collection of documents in Solr (say
> a collection denotes someone's Dropbox files), then you have two options:
> index all collections in one Solr collection, and add a field like
> collectionID to each document and query, or index each user's private
> collection in a different Solr collection.
>
> The pros of the latter is that you don't need to add a collectionID filter
> to each query. Also from a security/privacy standpoint (and search quality)
> - a user can only ever search what he has access to -- e.g. it cannot get a
> spelling correction for words he never saw in his documents, nor document
> suggestions (even though the 'context' in some of Lucene suggesters allow
> one to do that too). From a quality standpoint you don't mix different term
> statistics etc.
>
> So from a single node's point of view, you can either index 100M documents
> in one index (Collection, shard, replica -- whatever -- a single Solr
> core), or in 10,000 such cores. From node capacity perspectives the two are
> the same -- same amount of documents will be indexed overall, same query
> workload etc.
>
> So the question is purely about Solr and its collections management -- is
> there anything in that process that can prevent one from managing thousands
> of collections on a single node, or within a single SolrCloud instance? If
> so, what is it -- are these the ZK watchers? Is there a thread per
> collection at work? Others?
>
> Shai
>
> On Sun, Jun 14, 2015 at 5:21 PM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
>> As a general rule, there are only two ways that Solr scales to large
>> numbers: large number of documents and moderate number of nodes (shards and
>> replicas). All other parameters should be kept relatively small, like
>> dozens or low hundreds. Even shards and replicas should probably kept down
>> to that same guidance of dozens or low hundreds.
>>
>> Tens of millions of documents should be no problem. I recommend 100 million
>> as the rough limit of documents per node. Of course it all depends on your
>> particular data model and data and hardware and network, so that number
>> could be smaller or larger.
>>
>> The main guidance has always been to simply do a proof of concept
>> implementation to test for your particular data model and data values.
>>
>> -- Jack Krupansky
>>
>> On Sun, Jun 14, 2015 at 7:31 AM, Arnon Yogev <arn...@il.ibm.com> wrote:
>>
>> > We're running some tests on Solr and would like to have a deeper
>> > understanding of its limitations.
>> >
>> > Specifically, We have tens of millions of documents (say 50M) and are
>> > comparing several "#collections X #docs_per_collection" configurations.
>> > For example, we could have a single collection with 50M docs or 5000
>> > collections with 10K docs each.
>> > When trying to create the 5000 collections, we start getting frequent
>> > errors after 1000-1500 collections have been created. Feels like some
>> > limit has been reached.
>> > These tests are done on a single node + an additional node for replica.
>> >
>> > Can someone elaborate on what could limit Solr to a high number of
>> > collections (if at all)?
>> > i.e. if we wanted to have 5K or 10K (or 100K) collections, is there
>> > anything in Solr that can prevent it? Where would it break?
>> >
>> > Thanks,
>> > Arnon
>>

Re: Limitation on Collections Number

Reply via email to