I have a cluster (12 nodes) with 664 collection, 12 shards each and replication 
factor 2

The main bottleneck will be the zookeeper, it’s too easy overflow the overseer 
queue when a node ejects due a GC pause. Other problem is that the time to 
restart a node will increase from seconds to minutes.

The tradeoff is not easy, depends of the number of machines, the volume of 
data, hardware and so on.

--

/Yago Riveiro

On 8 Aug 2017 20:27 +0100, Webster Homer <webster.ho...@sial.com>, wrote:
> Yes we do see replicas go into recovery.
>
> Most of our clouds are hosted in the google cloud. So flaky networks are
> probably not an issue, though firewalls to the clouds can be
>
> On Tue, Aug 8, 2017 at 2:14 PM, Erick Erickson <erickerick...@gmail.com
> wrote:
>
> > So in total you have 56 replicas, correct? This shouldn't be a
> > problem, we've seen many more replicas than that. Many many many.
> >
> > Do you ever see any replicas go into recovery? One common problem is
> > that GC exceeds the timeouts for, say, Zookeeper to contact nodes and
> > they'll cycle through recovery. Have you captured the GC logs and seen
> > if you have large stop-the-world GC pauses?
> >
> > In short, what you've described should be easily handled. My guess is
> > GC pauses, I/O contention and/or flaky networks....
> >
> > Best,
> > Erick
> >
> > On Tue, Aug 8, 2017 at 11:35 AM, Webster Homer <webster.ho...@sial.com
> > wrote:
> > > We have a Solrcloud environments that have 4 solr nodes and a 3 node
> > > Zookeeper ensemble. All of the collections are configured to have 2
> > shards
> > > with 2 replicas. In this environment we have 14 different collections.
> > Some
> > > of these collections are hardly touched others have a fairly heavy search
> > > and update load.
> > > 1 collection his near real time updates every minutes and constant
> > > searches, but it is not very large
> > > another has a fairly constant search load with updates of a few records
> > > every 15 minutes. 6 collections are search heavy but update light (1 full
> > > load per week with daily partials)
> > >
> > > Updates to production cloud are via CDCR from an "authoring" cloud which
> > > replicates to two production clouds.
> > > We often see issues with replicas not being updated, and tlogs
> > accumulating.
> > >
> > > We have autoCommit and autoSoftCommit set on all our collections, and
> > CDCR
> > > logs disabled. We are running Solr 6.2
> > >
> > > We also run into errors saying that "no live solr Servers available to
> > > service the request" but all nodes appear healthy. So I've been
> > wondering
> > > if we just have too many collections for the number of nodes.
> > >
> > > Are there tell tale diagnostics that could determine if the servers are
> > > over loaded?
> > >
> > > Are there any guidelines for number of collections vs number of nodes in
> > a
> > > solrcloud?
> > >
> > > We run our zookeepers via supervisord, and all of this is behind
> > firewalls.
> > > So the Zookeeper JMX interface is useless. How do we get good diagnostics
> > > from Zookeeper? I know that sometimes problems go away when we restart
> > the
> > > Zookeepers and the solr nodes.
> > >
> > > Thanks
> > >
> > > --
> > >
> > >
> > > This message and any attachment are confidential and may be privileged or
> > > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > > you must not copy this message or attachment or disclose the contents to
> > > any other person. If you have received this transmission in error, please
> > > notify the sender immediately and delete the message and any attachment
> > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not accept liability for any omissions or errors in this
> > > message which may arise as a result of E-Mail-transmission or for damages
> > > resulting from any unauthorized changes of the content of this message
> > and
> > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not guarantee that this message is free of viruses and
> > does
> > > not accept liability for any damages caused by any virus transmitted
> > > therewith.
> > >
> > > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > > Spanish and Portuguese versions of this disclaimer.
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Reply via email to