Hi Erick,

thanks for your answer.

We use java 8 and allocate a 16GB heap size

 -Xms2g -Xmx16g

There are 1.5M docs and about 16 GB index size on disk.

Let me also say, during the day we have a lot of little update, from 1k to
50k docs every time, and we do a full update of all documents during the
night. And during this full update the 20 seconds GC happened.

I haven't read completely the Uwe's post just because was too long, all I
got was that I have to use MMapDirectory.
But I was still unable to restart the production with this new component.
After the change it is not clear if we only need to restart the core/node
or if a full reindex must be done.

Thanks for your time, I'll read very carefully Uwe's post.


On Thu, Jul 2, 2015 at 5:39 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Vincenzo:
>
> First and foremost, figure out why you're having 20 second GC pauses. For
> indexes like you're describing, this is unusual. How big is the heap
> you allocate to the JVM?
>
> Check your Zookeeper timeout. In earlier versions of SolrCloud it
> defaulted to
> 15 seconds. Going into leader election would happen for no obvious reason,
> and lengthening it to 30-60 seconds seemed to help a lot of people.
>
> The disks should be largely irrelevant to the origin or cure for this
> problem...
>
> Here's a good article on why you want to allocate "just enough" heap
> for your app. Of course, "just enough" can be interesting to actually
> define:
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Best,
> Erick
>
> On Thu, Jul 2, 2015 at 5:45 AM, Vincenzo D'Amore <v.dam...@gmail.com>
> wrote:
> > Hi All,
> >
> > In the latest months my SolrCloud clusters, sometimes (one/two times a
> > week), have few replicas down.
> > Usually all the replicas goes down on the same node.
> > I'm unable to understand why a 3 nodes cluster with 8 core/32 GB and high
> > performance disks have this problem. The main index is small, about 1.5 M
> > of documents with very small text inside.
> > I don't know if having 3 shards with 3 replicas is too much, to me it
> seems
> > a fair high high availability, but anyway this should not compromise the
> > cluster stability.
> > All the queries are under the second, so it is responsive.
> >
> > Few months ago I begun to think the problem was related to an old and
> > bugged version of SolrCloud that we have to upgrade.
> > But reading in this list about the classic XY problem I changed my mind,
> > maybe there a much better solution.
> >
> > This night I had, again, a couple of replicas down around 1.07 AM, this
> is
> > the SolrCloud log file:
> >
> > http://pastebin.com/raw.php?i=bCHnqnXD
> >
> > At end of exceptions list there are few "cancelElection did not find
> > election node to remove" errors and this morning I found the replicas
> down.
> >
> > Looking GC log file I found that at same moment there is a GC that takes
> > about 20 seconds. Now I'm using CMS (ConcurrentMarkSweep) Collector taken
> > from Shawn Hensey suggestions:
> >
> https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector
> >
> >
> > http://pastebin.com/raw.php?i=VuSrg4uz
> >
> > At last, looking around in the latest months I found this bug, that seems
> > to me be related to with this problems.
> > So I begun to think that I need an upgrade, am I right? What do you think
> > about ?
> >
> > https://issues.apache.org/jira/browse/SOLR-6159
> >
> > Any help is very appreciated.
> >
> > Thanks,
> > Vincenzo
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Reply via email to