Re: AW: Scaling to large Number of Collections

Jack Krupansky Sun, 31 Aug 2014 09:06:45 -0700

You close with two great questions for the community!

We have a similar issue over in Apache Cassandra database land (thousands oftables).

There is no immediate, easy, great answer. Other than the kinds of"workarounds" being suggested.


-- Jack Krupansky

-----Original Message-----From: Christoph Schmidt

Sent: Sunday, August 31, 2014 11:44 AM
To: solr-user@lucene.apache.org
Subject: AW: Scaling to large Number of Collections

One collection has 2 replicas, no sharding, the collections are not thatbig.

No, they are unfortunately not independent. There are collections withcustomer documents (some thousand customers) and product collections. Onecustomer has at least on customer collection and 1 to some hundred products.The combination of these collections is used to drive the search of aLiferay portal. Each customer has its own Liferay portal.

We could split up the cluster in several clusters by customers, but then wehad for duplicate the product collections in each SolrCluster.

Will Solr go in the direction of "large number of collections"? And thequestion is, what is a "large number"?


Best
Christoph

-----Ursprüngliche Nachricht-----
Von: Jack Krupansky [mailto:j...@basetechnology.com]
Gesendet: Sonntag, 31. August 2014 14:09
An: solr-user@lucene.apache.org
Betreff: Re: Scaling to large Number of Collections

How are the 5 servers arranged in terms of shards and replicas? 5 shardswith 1 replica each, 1 shard with 5 replicas, 2 shards with 2 and 3replicas, or... what?

How big is each collection? The key strength of SolrCloud is scaling largecollections via shards, NOT scaling large numbers of collections. If youhave large numbers of collections, maybe they should be divided intoseparate clusters, especially if they are independent.


Is this a multi-tenancy situation or a single humongous app?

In any case, "large numbers of collections in a single SolrCloud cluster" isnot a supported scenario at this time. Certainly suggestions for futureenhancement can be made though.


-- Jack Krupansky

-----Original Message-----
From: Christoph Schmidt
Sent: Sunday, August 31, 2014 4:04 AM
To: solr-user@lucene.apache.org
Subject: Scaling to large Number of Collections

we see at least two problems when scaling to large number of collections. Iwould like to ask the community, if they are known and maybe alreadyaddressed in development:

We have a SolrCloud running with the following numbers:
-          5 Servers (each 24 CPUs, 128 RAM)
-          13.000 Collection with 25.000 SolrCores in the Cloud

The Cloud is working fine, but we see two problems, if we like to scalefurther

1.       Resource consumption of native system threads

We see that each collection opens at least two threads: one for thezookeeper (coreZkRegister-1-thread-5154) and one for the searcher

(searcherExecutor-28357-thread-1)

We will run in "OutOfMemoryError: unable to create new native thread". Maybethe architecture could be changed here to use thread pools?

2.       The shutdown and the startup of one server in the SolrCloud takes 2

hours. So a rolling start is about 10h. For me the problem seems to be thatleader election is "linear". The Overseer does core per core. Theorganisation of the cloud is not done parallel or distributed. Is thisalready addressed by https://issues.apache.org/jira/browse/SOLR-5473 or isthere more needed?


Thanks for discussion and help
Christoph
_______________________________________________

Dr. Christoph Schmidt | Geschäftsführer

P +49-89-523041-72
M +49-171-1419367
Skype: cs_moresophy
christoph.schm...@moresophy.de<mailto:heiko.be...@moresophy.de>
www.moresophy.com<http://www.moresophy.com/>

moresophy GmbH | Fraunhoferstrasse 15 | 82152 München-Martinsried

Re: AW: Scaling to large Number of Collections

Reply via email to