Dave,

there is something similar like MAX_CONNECTIONS and
MAX_CONNECTIONS_PER_HOST which control the number of connections.

Are you leaving open the connection to zookeeper after you establish it?
Are you using the singleton pattern?

2016-12-28 14:14 GMT-03:00 Dave Seltzer <dselt...@tveyes.com>:

> Hi Erick,
>
> I'll dig in on these timeout settings and see how changes affect behavior.
>
> One interesting aspect is that we're not indexing any content at the
> moment. The rate of ingress is something like 10 to 20 documents per day.
>
> So my guess is that ZK simply is deciding that these servers are dead based
> on the fact that responses are so very sluggish.
>
> You've mentioned lots of timeouts, but are there any settings which control
> the number of available threads? Or is this something which is largely
> handled automagically?
>
> Many thanks!
>
> -Dave
>
> On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Dave:
> >
> > There are at least 4 timeouts (not even including ZK) that can
> > be relevant, defined in solr.xml:
> > socketTimeout
> > connTimeout
> > distribUpdateConnTimeout
> > distribUpdateSoTimeout
> >
> > Plus the ZK timeout
> > zkClientTimeout
> >
> > Plus the ZK configurations.
> >
> > So it would help narrow down what's going on if we knew why the nodes
> > dropped out. There are indeed a lot of messages dumped, but somewhere
> > in the logs there should be a root cause.
> >
> > You might see Leader Initiated Recovery (LIR) which can indicate that
> > an update operation from the leader took too long, the timeouts above
> > can be adjusted in this case.
> >
> > You might see evidence that ZK couldn't get a response from Solr in
> > "too long" and decided it was gone.
> >
> > You might see...
> >
> > One thing I'd look at very closely is GC processing. One of the
> > culprits for this behavior I've seen is a very long GC stop-the-world
> > pause leading to ZK thinking the node is dead and tripping this chain.
> > Depending on the timeouts, "very long" might be a few seconds.
> >
> > Not entirely helpful, but until you pinpoint why the node goes into
> > recovery it's throwing darts at the wall. GC and log messages might
> > give some insight into the root cause.
> >
> > Best,
> > Erick
> >
> > On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer <dselt...@tveyes.com>
> wrote:
> > > Hello Everyone,
> > >
> > > I'm working on a Solr Cloud cluster which is used in a hash matching
> > > application.
> > >
> > > For performance reasons we've opted to batch-execute hash matching
> > queries.
> > > This means that a single query will contain many nested queries. As you
> > > might expect, these queries take a while to execute. (On the order of 5
> > to
> > > 10 seconds.)
> > >
> > > I've noticed that Solr will act erratically when we send too many
> > > long-running queries. Specifically, heavily-loaded servers will
> > repeatedly
> > > fall out of the cluster and then recover. My theory is that there's
> some
> > > limit on the number of concurrent connections and that client queries
> are
> > > preventing zookeeper related queries... but I'm not sure. I've
> increased
> > > ZKClientTimeout to combat this.
> > >
> > > My question is: What configuration settings should I be looking at in
> > order
> > > to make sure I'm maximizing the ability of Solr to handle concurrent
> > > requests.
> > >
> > > Many thanks!
> > >
> > > -Dave
> >
>

Reply via email to