Re: Large number of collections in SolrCloud

Shawn Heisey Mon, 27 Jul 2015 08:53:51 -0700

On 7/27/2015 9:16 AM, Olivier wrote:
> I have a SolrCloud cluster with 3 nodes :  3 shards per node and
> replication factor at 3.
> The collections number is around 1000. All the collections use the same
> Zookeeper configuration.
> So when I create each collection, the ZK configuration is pulled from ZK
> and the configuration files are stored in the JVM.
> I thought that if the configuration was the same for each collection, the
> impact on the JVM would be insignifiant because the configuration should be
> loaded only once. But it is not the case, for each collection created, the
> JVM size increases because the configuration is loaded again, am I correct ?
> 
> If I have a small configuration folder size, I have no problem because the
> folder size is less than 500 KB so if we count 1000 collections x 500 KB,
> the JVM impact is 500 MB.
> But we manage a lot of languages with some dictionaries so the
> configuration folder size is about 6 MB. The JVM impact is very important
> now because it can be more than 6 GB (1000 x 6 MB).
> 
> So I would like to have the feeback of people who have a cluster with a
> large number of collections too. Do I have to change some settings to
> handle this case better ? What can I do to optimize this behaviour ?
> For now, we just increase the RAM size per node at 16 GB but we plan to
> increase the collections number.


Severe issues were noticed when dealing with many collections, and this
was with a simple config, and completely empty indexes.  A complex
config and actual index data would make it run that much more slowly.

https://issues.apache.org/jira/browse/SOLR-7191

Memory usage for the config wasn't even considered when I was working on
reporting that issue.

SolrCloud is highly optimized to work well when there are a relatively
small number of collections.  I think there is work that we can do which
will optimize operations to the point where thousands of collections
will work well, especially if they all share the same config/schema ...
but this is likely to be a fair amount of work, which will only benefit
a handful of users who are pushing the boundaries of what Solr can do.
In the open source world, a problem like that doesn't normally receive a
lot of developer attention, and we rely much more on help from the
community, specifically from knowledgeable users who are having the
problem and know enough to try and fix it.

FYI -- 16GB of RAM per machine is quite small for Solr, particularly
when pushing the envelope.  My Solr machines are maxed at 64GB, and I
frequently wish I could install more.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

One possible solution for your dilemma is simply adding more machines
and spreading your collections out so each machine's memory requirements
go down.

Thanks,
Shawn

Re: Large number of collections in SolrCloud

Reply via email to