Re: Problem XY - X = SolrCloud 4.8 replicas down, Y = SolrCloud upgrade to a new version

Erick Erickson Thu, 02 Jul 2015 08:40:12 -0700

Vincenzo:

First and foremost, figure out why you're having 20 second GC pauses. For
indexes like you're describing, this is unusual. How big is the heap
you allocate to the JVM?


Check your Zookeeper timeout. In earlier versions of SolrCloud it defaulted to
15 seconds. Going into leader election would happen for no obvious reason,
and lengthening it to 30-60 seconds seemed to help a lot of people.

The disks should be largely irrelevant to the origin or cure for this problem...

Here's a good article on why you want to allocate "just enough" heap
for your app. Of course, "just enough" can be interesting to actually
define:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Thu, Jul 2, 2015 at 5:45 AM, Vincenzo D'Amore <v.dam...@gmail.com> wrote:
> Hi All,
>
> In the latest months my SolrCloud clusters, sometimes (one/two times a
> week), have few replicas down.
> Usually all the replicas goes down on the same node.
> I'm unable to understand why a 3 nodes cluster with 8 core/32 GB and high
> performance disks have this problem. The main index is small, about 1.5 M
> of documents with very small text inside.
> I don't know if having 3 shards with 3 replicas is too much, to me it seems
> a fair high high availability, but anyway this should not compromise the
> cluster stability.
> All the queries are under the second, so it is responsive.
>
> Few months ago I begun to think the problem was related to an old and
> bugged version of SolrCloud that we have to upgrade.
> But reading in this list about the classic XY problem I changed my mind,
> maybe there a much better solution.
>
> This night I had, again, a couple of replicas down around 1.07 AM, this is
> the SolrCloud log file:
>
> http://pastebin.com/raw.php?i=bCHnqnXD
>
> At end of exceptions list there are few "cancelElection did not find
> election node to remove" errors and this morning I found the replicas down.
>
> Looking GC log file I found that at same moment there is a GC that takes
> about 20 seconds. Now I'm using CMS (ConcurrentMarkSweep) Collector taken
> from Shawn Hensey suggestions:
> https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector
>
>
> http://pastebin.com/raw.php?i=VuSrg4uz
>
> At last, looking around in the latest months I found this bug, that seems
> to me be related to with this problems.
> So I begun to think that I need an upgrade, am I right? What do you think
> about ?
>
> https://issues.apache.org/jira/browse/SOLR-6159
>
> Any help is very appreciated.
>
> Thanks,
> Vincenzo

Re: Problem XY - X = SolrCloud 4.8 replicas down, Y = SolrCloud upgrade to a new version

Reply via email to