Re: OOM spreads to other replica's/HA when OOM

Bojan Vukojevic Mon, 18 Dec 2017 13:07:01 -0800

UNSUBSCRIBE

On Mon, Dec 18, 2017 at 12:57 PM Susheel Kumar <susheel2...@gmail.com>
wrote:


> Technically I agree Shawn with you on fixing OOME cause, Infact it is not
> an issue any more but I was testing for HA when planing for any failures.
> Same time it's hard to convince Business folks that HA wouldn't be there in
> case of OOME.
>
> I think the best option is to enable timeAllowed for now.
>
> Thanks,
> Susheel
>
> On Mon, Dec 18, 2017 at 11:37 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
>
> > On 12/18/2017 9:01 AM, Susheel Kumar wrote:
> > > Any thoughts on how one can provide HA in these situations.
> >
> > As I have said already a couple of times today on other threads, there
> > are *exactly* two ways to deal with OOME.  No other solution is possible.
> >
> > 1) Configure the system to allow the process to access more of the
> > resource that it's running out of.  This is typically the solution that
> > people will utilize.  In your case, you would need to make the heap
> larger.
> >
> > 2) Change the configuration or the environment so fewer resources are
> > required.
> >
> > OOME is special.  It is a problem that all the high availability steps
> > in the world cannot protect you from, for precisely the reasons that
> > Emir and I have described.  You must ensure that Solr is set up so there
> > are enough resources that OOME cannot occur.
> >
> > I can see a general argument for making it possible to configure or
> > disable any retry mechanism in SolrCloud, but that is not the solution
> > here.  It would most likely only *delay* the problem to a later query.
> > The OOME itself must be fixed, using one of the two solutions already
> > outlined.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: OOM spreads to other replica's/HA when OOM

Reply via email to