I have a cluster (12 nodes) with 664 collection, 12 shards each and replication factor 2
The main bottleneck will be the zookeeper, it’s too easy overflow the overseer queue when a node ejects due a GC pause. Other problem is that the time to restart a node will increase from seconds to minutes. The tradeoff is not easy, depends of the number of machines, the volume of data, hardware and so on. -- /Yago Riveiro On 8 Aug 2017 20:27 +0100, Webster Homer <webster.ho...@sial.com>, wrote: > Yes we do see replicas go into recovery. > > Most of our clouds are hosted in the google cloud. So flaky networks are > probably not an issue, though firewalls to the clouds can be > > On Tue, Aug 8, 2017 at 2:14 PM, Erick Erickson <erickerick...@gmail.com > wrote: > > > So in total you have 56 replicas, correct? This shouldn't be a > > problem, we've seen many more replicas than that. Many many many. > > > > Do you ever see any replicas go into recovery? One common problem is > > that GC exceeds the timeouts for, say, Zookeeper to contact nodes and > > they'll cycle through recovery. Have you captured the GC logs and seen > > if you have large stop-the-world GC pauses? > > > > In short, what you've described should be easily handled. My guess is > > GC pauses, I/O contention and/or flaky networks.... > > > > Best, > > Erick > > > > On Tue, Aug 8, 2017 at 11:35 AM, Webster Homer <webster.ho...@sial.com > > wrote: > > > We have a Solrcloud environments that have 4 solr nodes and a 3 node > > > Zookeeper ensemble. All of the collections are configured to have 2 > > shards > > > with 2 replicas. In this environment we have 14 different collections. > > Some > > > of these collections are hardly touched others have a fairly heavy search > > > and update load. > > > 1 collection his near real time updates every minutes and constant > > > searches, but it is not very large > > > another has a fairly constant search load with updates of a few records > > > every 15 minutes. 6 collections are search heavy but update light (1 full > > > load per week with daily partials) > > > > > > Updates to production cloud are via CDCR from an "authoring" cloud which > > > replicates to two production clouds. > > > We often see issues with replicas not being updated, and tlogs > > accumulating. > > > > > > We have autoCommit and autoSoftCommit set on all our collections, and > > CDCR > > > logs disabled. We are running Solr 6.2 > > > > > > We also run into errors saying that "no live solr Servers available to > > > service the request" but all nodes appear healthy. So I've been > > wondering > > > if we just have too many collections for the number of nodes. > > > > > > Are there tell tale diagnostics that could determine if the servers are > > > over loaded? > > > > > > Are there any guidelines for number of collections vs number of nodes in > > a > > > solrcloud? > > > > > > We run our zookeepers via supervisord, and all of this is behind > > firewalls. > > > So the Zookeeper JMX interface is useless. How do we get good diagnostics > > > from Zookeeper? I know that sometimes problems go away when we restart > > the > > > Zookeepers and the solr nodes. > > > > > > Thanks > > > > > > -- > > > > > > > > > This message and any attachment are confidential and may be privileged or > > > otherwise protected from disclosure. If you are not the intended > > recipient, > > > you must not copy this message or attachment or disclose the contents to > > > any other person. If you have received this transmission in error, please > > > notify the sender immediately and delete the message and any attachment > > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not accept liability for any omissions or errors in this > > > message which may arise as a result of E-Mail-transmission or for damages > > > resulting from any unauthorized changes of the content of this message > > and > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not guarantee that this message is free of viruses and > > does > > > not accept liability for any damages caused by any virus transmitted > > > therewith. > > > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > > Spanish and Portuguese versions of this disclaimer. > > > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.