Yes we do see replicas go into recovery. Most of our clouds are hosted in the google cloud. So flaky networks are probably not an issue, though firewalls to the clouds can be
On Tue, Aug 8, 2017 at 2:14 PM, Erick Erickson <erickerick...@gmail.com> wrote: > So in total you have 56 replicas, correct? This shouldn't be a > problem, we've seen many more replicas than that. Many many many. > > Do you ever see any replicas go into recovery? One common problem is > that GC exceeds the timeouts for, say, Zookeeper to contact nodes and > they'll cycle through recovery. Have you captured the GC logs and seen > if you have large stop-the-world GC pauses? > > In short, what you've described should be easily handled. My guess is > GC pauses, I/O contention and/or flaky networks.... > > Best, > Erick > > On Tue, Aug 8, 2017 at 11:35 AM, Webster Homer <webster.ho...@sial.com> > wrote: > > We have a Solrcloud environments that have 4 solr nodes and a 3 node > > Zookeeper ensemble. All of the collections are configured to have 2 > shards > > with 2 replicas. In this environment we have 14 different collections. > Some > > of these collections are hardly touched others have a fairly heavy search > > and update load. > > 1 collection his near real time updates every minutes and constant > > searches, but it is not very large > > another has a fairly constant search load with updates of a few records > > every 15 minutes. 6 collections are search heavy but update light (1 full > > load per week with daily partials) > > > > Updates to production cloud are via CDCR from an "authoring" cloud which > > replicates to two production clouds. > > We often see issues with replicas not being updated, and tlogs > accumulating. > > > > We have autoCommit and autoSoftCommit set on all our collections, and > CDCR > > logs disabled. We are running Solr 6.2 > > > > We also run into errors saying that "no live solr Servers available to > > service the request" but all nodes appear healthy. So I've been > wondering > > if we just have too many collections for the number of nodes. > > > > Are there tell tale diagnostics that could determine if the servers are > > over loaded? > > > > Are there any guidelines for number of collections vs number of nodes in > a > > solrcloud? > > > > We run our zookeepers via supervisord, and all of this is behind > firewalls. > > So the Zookeeper JMX interface is useless. How do we get good diagnostics > > from Zookeeper? I know that sometimes problems go away when we restart > the > > Zookeepers and the solr nodes. > > > > Thanks > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.