We did some basic load testing on our 7.1.0 and 7.2.1 clusters.
And that came out all right.
We saw a performance increase of about 30% in read latencies between 6.6.0
and 7.1.0
And then we saw a performance degradation of about 10% between 7.1.0 and
7.2.1 in many metrics.
But overall, it still seems better than 6.6.0.

I will check for the errors too in the logs but the nodes were responsive
for all the 23+ hours we did the load test.

Disclaimer: We do not test facets and pivots or block-joins. And will add
those features to our load-testing tool sometime this year.

Thanks
SG


On Wed, Jan 31, 2018 at 3:12 AM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Ah thanks, i just submitted a patch fixing it.
>
> Anyway, in the end it appears this is not the problem we are seeing as our
> timeouts were already at 30 seconds.
>
> All i know is that at some point nodes start to lose ZK connections due to
> timeouts (logs say so, but all within 30 seconds), the logs are flooded
> with those messages:
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 10359ms for sessionid 0x160f9e723c12122
> o.a.z.ClientCnxn Unable to reconnect to ZooKeeper service, session
> 0x60f9e7234f05bb has expired
>
> Then there is a doubling in heap usage and nodes become unresponsive, die
> etc.
>
> We also see those messages in other collections, but not so frequently and
> they don't cause failure in those less loaded clusters.
>
> Ideas?
>
> Thanks,
> Markus
>
> -----Original message-----
> > From:Michael Braun <n3c...@gmail.com>
> > Sent: Monday 29th January 2018 21:09
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> >
> > Believe this is reported in https://issues.apache.org/
> jira/browse/SOLR-10471
> >
> >
> > On Mon, Jan 29, 2018 at 2:55 PM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello SG,
> > >
> > > The default in solr.in.sh is commented so it defaults to the value
> set in
> > > bin/solr, which is fifteen seconds. Just uncomment the setting in
> > > solr.in.sh and your timeout will be thirty seconds.
> > >
> > > For Solr itself to really default to thirty seconds, Solr's bin/solr
> needs
> > > to be patched to use the correct value.
> > >
> > > Regards,
> > > Markus
> > >
> > > -----Original message-----
> > > > From:S G <sg.online.em...@gmail.com>
> > > > Sent: Monday 29th January 2018 20:15
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > > >
> > > > Hi Markus,
> > > >
> > > > We are in the process of upgrading our clusters to 7.2.1 and I am not
> > > sure
> > > > I quite follow the conversation here.
> > > > Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher
> > > value
> > > > in the config (and it's just a default value being wrong/overridden
> > > > somewhere)?
> > > > Or is it more severe in the sense that any config set for
> > > ZK_CLIENT_TIMEOUT
> > > > by the user is just ignored completely by Solr in 7.2.1 ?
> > > >
> > > > Thanks
> > > > SG
> > > >
> > > >
> > > > On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma <
> > > markus.jel...@openindex.io>
> > > > wrote:
> > > >
> > > > > Ok, i applied the patch and it is clear the timeout is 15000.
> Solr.xml
> > > > > says 30000 if ZK_CLIENT_TIMEOUT is not set, which is by default
> unset
> > > in
> > > > > solr.in.sh,but set in bin/solr to 15000. So it seems Solr's
> default is
> > > > > still 15000, not 30000.
> > > > >
> > > > > But, back to my topic. I see we explicitly set it in solr.in.sh to
> > > 30000.
> > > > > To be sure, i applied your patch to a production machine, all our
> > > > > collections run with 30000. So how would that explain this log
> line?
> > > > >
> > > > > o.a.z.ClientCnxn Client session timed out, have not heard from
> server
> > > in
> > > > > 22130ms
> > > > >
> > > > > We also see these with smaller values, seven seconds. And, is this
> > > > > actually an indicator of the problems we have?
> > > > >
> > > > > Any ideas?
> > > > >
> > > > > Many thanks,
> > > > > Markus
> > > > >
> > > > >
> > > > > -----Original message-----
> > > > > > From:Markus Jelsma <markus.jel...@openindex.io>
> > > > > > Sent: Saturday 27th January 2018 10:03
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: RE: 7.2.1 cluster dies within minutes after restart
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I grepped for it yesterday and found nothing but 30000 in the
> > > settings,
> > > > > but judging from the weird time out value, you may be right. Let me
> > > apply
> > > > > your patch early next week and check for spurious warnings.
> > > > > >
> > > > > > Another note worthy observation for those working on cloud
> stability
> > > and
> > > > > recovery, whenever this happens, some nodes are also absolutely
> sure
> > > to run
> > > > > OOM. The leaders usually live longest, the replica's don't, their
> heap
> > > > > usage peaks every time, consistently.
> > > > > >
> > > > > > Thanks,
> > > > > > Markus
> > > > > >
> > > > > > -----Original message-----
> > > > > > > From:Shawn Heisey <apa...@elyograg.org>
> > > > > > > Sent: Saturday 27th January 2018 0:49
> > > > > > > To: solr-user@lucene.apache.org
> > > > > > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > > > > > >
> > > > > > > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > > > > > > o.a.z.ClientCnxn Client session timed out, have not heard
> from
> > > > > server in 22130ms (although zkClientTimeOut is 30000).
> > > > > > >
> > > > > > > Are you absolutely certain that there is a setting for
> > > zkClientTimeout
> > > > > > > that is actually getting applied?  The default value in Solr's
> > > example
> > > > > > > configs is 30 seconds, but the internal default in the code
> (when
> > > no
> > > > > > > configuration is found) is still 15.  I have confirmed this in
> the
> > > > > code.
> > > > > > >
> > > > > > > Looks like SolrCloud doesn't log the values it's using for
> things
> > > like
> > > > > > > zkClientTimeout.  I think it should.
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/SOLR-11915
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Shawn
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to