One collection has 2 replicas, no sharding, the collections are not that big.
No, they are unfortunately not independent. There are collections with customer documents (some thousand customers) and product collections. One customer has at least on customer collection and 1 to some hundred products. The combination of these collections is used to drive the search of a Liferay portal. Each customer has its own Liferay portal. We could split up the cluster in several clusters by customers, but then we had for duplicate the product collections in each SolrCluster. Will Solr go in the direction of "large number of collections"? And the question is, what is a "large number"? Best Christoph -----Ursprüngliche Nachricht----- Von: Jack Krupansky [mailto:j...@basetechnology.com] Gesendet: Sonntag, 31. August 2014 14:09 An: solr-user@lucene.apache.org Betreff: Re: Scaling to large Number of Collections How are the 5 servers arranged in terms of shards and replicas? 5 shards with 1 replica each, 1 shard with 5 replicas, 2 shards with 2 and 3 replicas, or... what? How big is each collection? The key strength of SolrCloud is scaling large collections via shards, NOT scaling large numbers of collections. If you have large numbers of collections, maybe they should be divided into separate clusters, especially if they are independent. Is this a multi-tenancy situation or a single humongous app? In any case, "large numbers of collections in a single SolrCloud cluster" is not a supported scenario at this time. Certainly suggestions for future enhancement can be made though. -- Jack Krupansky -----Original Message----- From: Christoph Schmidt Sent: Sunday, August 31, 2014 4:04 AM To: solr-user@lucene.apache.org Subject: Scaling to large Number of Collections we see at least two problems when scaling to large number of collections. I would like to ask the community, if they are known and maybe already addressed in development: We have a SolrCloud running with the following numbers: - 5 Servers (each 24 CPUs, 128 RAM) - 13.000 Collection with 25.000 SolrCores in the Cloud The Cloud is working fine, but we see two problems, if we like to scale further 1. Resource consumption of native system threads We see that each collection opens at least two threads: one for the zookeeper (coreZkRegister-1-thread-5154) and one for the searcher (searcherExecutor-28357-thread-1) We will run in "OutOfMemoryError: unable to create new native thread". Maybe the architecture could be changed here to use thread pools? 2. The shutdown and the startup of one server in the SolrCloud takes 2 hours. So a rolling start is about 10h. For me the problem seems to be that leader election is "linear". The Overseer does core per core. The organisation of the cloud is not done parallel or distributed. Is this already addressed by https://issues.apache.org/jira/browse/SOLR-5473 or is there more needed? Thanks for discussion and help Christoph _______________________________________________ Dr. Christoph Schmidt | Geschäftsführer P +49-89-523041-72 M +49-171-1419367 Skype: cs_moresophy christoph.schm...@moresophy.de<mailto:heiko.be...@moresophy.de> www.moresophy.com<http://www.moresophy.com/> moresophy GmbH | Fraunhoferstrasse 15 | 82152 München-Martinsried