Re: replicas goes in recovery mode right after update

Erick Erickson Sun, 25 Jan 2015 19:55:20 -0800

Ah, OK. Whew! because I was wondering how you were running at _all_ if all
the memory was allocated to the JVM ;)..


What is your Zookeeper timeout? The original default was 15 seconds and this
has caused problems like this. Here's the scenario:
You send a bunch of docs at the server, and eventually you hit a
stop-the-world
GC that takes longer than the Zookeeper timeout. So ZK thinks the node is
down
and initiates recovery. Eventually, you hit this on all the replicas.

Sometimes I've seen situations where the answer is giving a bit more memory
to the JVM, say 2-4G in your case. The theory here (and this is a shot in
the
dark) that your peak JVM requirements are close to your 12G, so the garbage
collection spends enormous amounts of time collecting a small bit of memory,
runs for some fraction of a second and does it again. Adding more to the
JVMs
memory allows the parallel collections to work without so many
stop-the-world
GC pauses.

So what I'd do is turn on GC logging (probably on the replicas) and look for
very long GC pauses. Mark Miller put together a blog here:
https://lucidworks.com/blog/garbage-collection-bootcamp-1-0/

See the "getting a view into garbage collection". The smoking gun here
is if you see full GC pauses that are longer than the ZK timeout.

90M docs in 4 hours across 10 shards is only 625/sec or so per shard. I've
seen
sustained indexing rates significantly above this, YMMV or course, a lot
depends
on the size of the docs.

What version of Solr BTW? And when you say you fire a bunch of indexers,
I'm
assuming these are SolrJ clients and use CloudSolrServer?

Best,
Erick


On Sun, Jan 25, 2015 at 4:10 PM, Vijay Sekhri <sekhrivi...@gmail.com> wrote:

> Thank you for the reply Eric.
> I am sorry I had wrong information posted. I posted our DEV env
> configuration by mistake.
> After double checking our stress and Prod Beta env where we have found the
> original issue, I found all the searchers have around 50 GB of RAM
> available and two instances of JVM running (2 different ports). Both
> instances have 12 GB allocated. The rest 26 GB is available for the OS. 1st
>  instance on a host has search1 collection (live collection) and the 2nd
> instance on the same host  has search2 collection (for full indexing ).
>
> There is plenty room for OS related tasks. Our issue is not in anyway
> related to OS starving as shown from our dashboards.
> We have been through
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> a lot of times but  we have two modes of operation
> a)  1st collection (Live traffic) - heavy searches and medium indexing
> b)  2nd collection (Not serving traffic) - very heavy indexing, no searches
>
> When our indexing finishes we swap the alias for these collection . So
> essentially we need to have a configuration that can support both the use
> cases together. We have tried a lot of different configuration options and
> none of them seems to work. My suspicion is that solr cloud is unable to
> keep up with the updates at the rate we are sending while it is trying to
> be consistent with all the replicas.
>
>
> On Sun, Jan 25, 2015 at 5:30 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Shawn directed you over here to the user list, but I see this note on
> > SOLR-7030:
> > "All our searchers have 12 GB of RAM available and have quad core
> Intel(R)
> > Xeon(R) CPU X5570 @ 2.93GHz. There is only one java process running i.e
> > jboss and solr in it . All 12 GB is available as heap for the java
> > process..."
> >
> > So you have 12G physical memory and have allocated 12G to the Java
> process?
> > This is an anti-pattern. If that's
> > the case, your operating system is being starved for memory, probably
> > hitting a state where it spends all of its
> > time in stop-the-world garbage collection, eventually it doesn't respond
> to
> > Zookeeper's ping so Zookeeper
> > thinks the node is down and puts it into recovery. Where it spends a lot
> of
> > time doing... essentially nothing.
> >
> > About the hard and soft commits: I suspect these are entirely unrelated,
> > but here's a blog on what they do, you
> > should pick the configuration that supports your use case (i.e. how much
> > latency can you stand between indexing
> > and being able to search?).
> >
> >
> >
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > Here's one very good reason you shouldn't starve your op system by
> > allocating all the physical memory to the JVM:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> >
> > But your biggest problem is that you have far too much of your physical
> > memory allocated to the JVM. This
> > will cause you endless problems, you just need more physical memory on
> > those boxes. It's _possible_ you could
> > get by with less memory for the JVM, counterintuitive as it seems try 8G
> or
> > maybe even 6G. At some point
> > you'll hit OOM errors, but that'll give you a lower limit on what the JVM
> > needs.
> >
> > Unless I've mis-interpreted what you've written, though, I doubt you'll
> get
> > stable with that much memory allocated
> > to the JVM.
> >
> > Best,
> > Erick
> >
> >
> >
> > On Sun, Jan 25, 2015 at 1:02 PM, Vijay Sekhri <sekhrivi...@gmail.com>
> > wrote:
> >
> > > We have a cluster of solr cloud server with 10 shards and 4 replicas in
> > > each shard in our stress environment. In our prod environment we will
> > have
> > > 10 shards and 15 replicas in each shard. Our current commit settings
> are
> > as
> > > follows
> > >
> > > *    <autoSoftCommit>*
> > > *        <maxDocs>500000</maxDocs>*
> > > *        <maxTime>180000</maxTime>*
> > > *    </autoSoftCommit>*
> > > *    <autoCommit>*
> > > *        <maxDocs>2000000</maxDocs>*
> > > *        <maxTime>180000</maxTime>*
> > > *        <openSearcher>false</openSearcher>*
> > > *    </autoCommit>*
> > >
> > >
> > > We indexed roughly 90 Million docs. We have two different ways to index
> > > documents a) Full indexing. It takes 4 hours to index 90 Million docs
> and
> > > the rate of docs coming to the searcher is around 6000 per second b)
> > > Incremental indexing. It takes an hour to indexed delta changes.
> Roughly
> > > there are 3 million changes and rate of docs coming to the searchers is
> > > 2500
> > > per second
> > >
> > > We have two collections search1 and search2. When we do full indexing ,
> > we
> > > do it in search2 collection while search1 is serving live traffic.
> After
> > it
> > > finishes we swap the collection using aliases so that the search2
> > > collection serves live traffic while search1 becomes available for next
> > > full indexing run. When we do incremental indexing we do it in the
> > search1
> > > collection which is serving live traffic.
> > >
> > > All our searchers have 12 GB of RAM available and have quad core
> Intel(R)
> > > Xeon(R) CPU X5570 @ 2.93GHz. There is only one java process running i.e
> > > jboss and solr in it . All 12 GB is available as heap for the java
> > > process.  We have observed that the heap memory of the java process
> > average
> > > around 8 - 10 GB. All searchers have final index size of 9 GB. So in
> > total
> > > there are 9X10 (shards) =  90GB worth of index files.
> > >
> > >  We have observed the following issue when we trigger indexing . In
> about
> > > 10 minutes after we trigger indexing on 14 parallel hosts, the replicas
> > > goes in to recovery mode. This happens to all the shards . In about 20
> > > minutes more and more replicas start going into recovery mode. After
> > about
> > > half an hour all replicas except the leader are in recovery mode. We
> > cannot
> > > throttle the indexing load as that will increase our overall indexing
> > time.
> > > So to overcome this issue, we remove all the replicas before we trigger
> > the
> > > indexing and then add them back after the indexing finishes.
> > >
> > > We observe the same behavior of replicas going into recovery when we do
> > > incremental indexing. We cannot remove replicas during our incremental
> > > indexing because it is also serving live traffic. We tried to throttle
> > our
> > > indexing speed , however the cluster still goes into recovery .
> > >
> > > If we leave the cluster as it , when the indexing finishes , it
> > eventually
> > > recovers after a while. As it is serving live traffic we cannot have
> > these
> > > replicas go into recovery mode because it degrades the search
> performance
> > > also , our tests have shown.
> > >
> > > We have tried different commit settings like below
> > >
> > > a) No auto soft commit, no auto hard commit and a commit triggered at
> the
> > > end of indexing b) No auto soft commit, yes auto hard commit and a
> commit
> > > in the end of indexing
> > > c) Yes auto soft commit , no auto hard commit
> > > d) Yes auto soft commit , yes auto hard commit
> > > e) Different frequency setting for commits for above. Please NOTE that
> we
> > > have tried 15 minute soft commit setting and 30 minutes hard commit
> > > settings. Same time settings for both, 30 minute soft commit and an
> hour
> > > hard commit setting
> > >
> > > Unfortunately all the above yields the same behavior . The replicas
> still
> > > goes in recovery We have increased the zookeeper timeout from 30
> seconds
> > to
> > > 5 minutes and the problem persists. Is there any setting that would fix
> > > this issue ?
> > >
> > > --
> > > *********************************************
> > > Vijay Sekhri
> > > *********************************************
> > >
> >
>
>
>
> --
> *********************************************
> Vijay Sekhri
> *********************************************
>

Re: replicas goes in recovery mode right after update

Reply via email to