UNSUBSCRIBE On Mon, Dec 18, 2017 at 12:57 PM Susheel Kumar <susheel2...@gmail.com> wrote:
> Technically I agree Shawn with you on fixing OOME cause, Infact it is not > an issue any more but I was testing for HA when planing for any failures. > Same time it's hard to convince Business folks that HA wouldn't be there in > case of OOME. > > I think the best option is to enable timeAllowed for now. > > Thanks, > Susheel > > On Mon, Dec 18, 2017 at 11:37 AM, Shawn Heisey <apa...@elyograg.org> > wrote: > > > On 12/18/2017 9:01 AM, Susheel Kumar wrote: > > > Any thoughts on how one can provide HA in these situations. > > > > As I have said already a couple of times today on other threads, there > > are *exactly* two ways to deal with OOME. No other solution is possible. > > > > 1) Configure the system to allow the process to access more of the > > resource that it's running out of. This is typically the solution that > > people will utilize. In your case, you would need to make the heap > larger. > > > > 2) Change the configuration or the environment so fewer resources are > > required. > > > > OOME is special. It is a problem that all the high availability steps > > in the world cannot protect you from, for precisely the reasons that > > Emir and I have described. You must ensure that Solr is set up so there > > are enough resources that OOME cannot occur. > > > > I can see a general argument for making it possible to configure or > > disable any retry mechanism in SolrCloud, but that is not the solution > > here. It would most likely only *delay* the problem to a later query. > > The OOME itself must be fixed, using one of the two solutions already > > outlined. > > > > Thanks, > > Shawn > > > > >