Great points, Ryan. I agree, if this is a bug fix rather than a feature/improvement we should fix it.
On Thu, Jan 24, 2019 at 9:58 AM Ryan McMahon <rmcma...@pivotal.io> wrote: > Hi Alexander, > > Thanks for bring up those concerns. I think in this case the default of 1 > is potentially masking behavior that the user would definitely want to see > and react to (such as heap memory read outliers/spikes), so it is a bit > different than other cases where changing the default would be a > nice-to-have in 2.0. The user can still adjust their parameter to deal > with any bad reads they are observing, but it would be best if they did > that explicitly instead of us hiding the behavior by default silently. So > I would argue it is a fair change in this case in a minor release, as it > could be viewed as a "bug fix" i.e. masking the bad reads by default. > > Ryan > > On Thu, Jan 24, 2019 at 9:49 AM David Wisler <dwis...@pivotal.io> wrote: > > > I talked to him. He is fine with our new change. > > > > :-) He just wanted to have it on the table. > > > > So you missed out on this 1 minute discussion. haha > > > > On Thu, Jan 24, 2019 at 9:38 AM Ryan McMahon <rmcma...@pivotal.io> > wrote: > > > >> Hey David, > >> > >> Alexander responded to the dev list, but not sure you got a copy. > >> Basically he is asking if we can wait to change the default tolerance > until > >> Geode 2.0, since it is a behavior change and apparently there are > several > >> config parameters for which we'd like to change the default, but we > pushed > >> it to 2.0. I don't feel super strongly either way. Do you want to > reply > >> to this thread or talk to Alexander directly? > >> > >> Ryan > >> > >> ---------- Forwarded message --------- > >> From: Alexander Murmann <amurm...@apache.org> > >> Date: Wed, Jan 23, 2019 at 4:13 PM > >> Subject: Re: [Proposal] Change default gemfire.memoryEventTolerance from > >> 1 to 0 > >> To: <dev@geode.apache.org> > >> Cc: Ryan McMahon <rmcma...@pivotal.io> > >> > >> > >> Ryan, thank you so much for the great explanation of your proposal! > >> > >> This seems very sound and you and David got me convinced that it's the > >> right thing to change the default. To me the question now is one of > >> timing. > >> Is this something we can change in a minor release or do we have to wait > >> for Geode 2.0? I think we have quite a few defaults we'd like to update. > >> However, a user might have a system in prod that relies on defaults > being > >> a > >> certain way and upgrading to the next minor shouldn't require any work > on > >> their end to prevent any negative impact on their system. > >> > >> Thoughts? > >> > >> On Tue, Jan 22, 2019 at 12:33 PM David Wisler <dwis...@pivotal.io> > wrote: > >> > >> > I would add that, by changing the default to 0, we can then skip all > of > >> the > >> > "special" logic that almost no customers use. With a default of 1, > >> we go > >> > into this logic every time unnecessarily, even when customers have not > >> > explicitly told us to "tolerate" an eviction or critical state change. > >> I > >> > am in favor of this default change to 0, and also add that there are > no > >> > customers who would even realize such a change in behavior has > occurred. > >> > I would also suggest that tolerating 1 critical reading, delaying the > >> > subsequent behaviors in GemFire when above critical, could make us > more > >> > vulnerable to OOME's than would be the case by immediately > transitioning > >> > state. > >> > > >> > My 2 cents. Thanks for the email Ryan. > >> > > >> > > >> > > >> > On Tue, Jan 22, 2019 at 10:22 AM Ryan McMahon <rmcma...@pivotal.io> > >> wrote: > >> > > >> > > Hi all, > >> > > > >> > > I am currently fixing a bug > >> > > <https://issues.apache.org/jira/browse/GEODE-6304> with the > >> > > HeapMemoryMonitor event tolerance feature, and came across a > decision > >> > that > >> > > I thought would be more appropriate for the Geode dev list. > >> > > > >> > > For those familiar with the feature, we are proposing that the > default > >> > > gemfire.memoryEventTolerance config parameter value is changed from > 1 > >> to > >> > 0 > >> > > so state transitions from normal to eviction or critical occur > >> > immediately > >> > > after reading a single heap-used-bytes event above threshold. If > you > >> are > >> > > unfamiliar with the feature, read on. > >> > > > >> > > The memory event tolerance feature addresses issues with some JVM > >> distros > >> > > that result in sporadic, erroneously high heap-bytes-used readings. > >> The > >> > > feature was introduced to address this issue in the JRockit JVM, but > >> it > >> > has > >> > > been found that other JVM distros are susceptible to this problem as > >> > well. > >> > > > >> > > The feature prevents an "unexpected" state transition from a normal > >> state > >> > > to an eviction or critical state by requiring N (configurable) > >> > consecutive > >> > > heap-used-byte events above threshold before changing states. The > >> > current > >> > > default configuration is N = 5 for JRockit and N = 1 for all other > >> JVMs. > >> > > In a non-JRockit JVM, this configuration permits a single event > above > >> > > threshold WITHOUT causing a state transition. In other words, by > >> > default, > >> > > we allow for a single bad outlier heap-used-bytes reading without > >> going > >> > > into an eviction or critical state. > >> > > > >> > > As part of this bug fix (which involves a failure to reset the > >> tolerance > >> > > counter under some conditions), we opted to remove the special > >> handling > >> > for > >> > > JRockit because JRockit is no longer supported. After removing the > >> > JRockit > >> > > handling, we started re-evaluating if a default value of 1 is > >> appropriate > >> > > for all other JVMs. We are considering changing the default to 0, > so > >> > state > >> > > transitions would occur immediately if an event above the threshold > is > >> > > received. If a user is facing one of these problematic JVMs, they > can > >> > then > >> > > change the gemfire.memoryEventTolerance config parameter to increase > >> the > >> > > tolerance. Our concern is that the default today is potentially > >> masking > >> > > bad heap readings without the user ever knowing. > >> > > > >> > > To summarize, if we change the default from 1 to 0 it would > >> potentially > >> > be > >> > > a change in behavior in that we would no longer be masking a single > >> bad > >> > > heap-used-bytes reading i.e. no longer permitting a single outlier > >> > without > >> > > changing states. The user can then decide whether to configure a > >> > non-zero > >> > > tolerance to address the situation. Any thoughts on this change in > >> > > behavior? > >> > > > >> > > Thanks, > >> > > Ryan > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > -- > >> > > >> > David Wisler | GemFire Support Product Manager | 503-810-7840 cell > >> > Support.Pivotal.io > >> > < > >> > > >> > http://www.google.com/url?q=http%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNGDBr_XSKC18wot5h3OkKoZ84Vn7Q > >> > > > >> > | Mon-Fri 8:00am to 5:00pm PST | 1-877-477-2269 > >> > [image: support] > >> > < > >> > > >> > https://www.google.com/url?q=https%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNEvwKLjzu29inKwy4jJjKsboqGMCg > >> > > > >> > [image: twitter] > >> > < > >> > > >> > https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fpivotal&sa=D&sntz=1&usg=AFQjCNG1FcqkH5ghKsSG6UkdeUzjSuDSHg > >> > > > >> > [image: linkedin] > >> > < > >> > > >> > https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F3048967&sa=D&sntz=1&usg=AFQjCNHOQGYmDYIQz06S3-vAuqzf8bN8Yw > >> > > > >> > [image: facebook] > >> > < > >> > > >> > https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fpivotalsoftware&sa=D&sntz=1&usg=AFQjCNFQnPFtec1Rp3lKf6MuY1jcbA8j2A > >> > > > >> > [image: google plus] <https://plus.google.com/+Pivotal> [image: > >> youtube] > >> > < > >> > https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl> > >> > > >> > > > > > > -- > > > > David Wisler | GemFire Support Product Manager | 503-810-7840 cell > > Support.Pivotal.io > > < > http://www.google.com/url?q=http%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNGDBr_XSKC18wot5h3OkKoZ84Vn7Q > > > > | Mon-Fri 8:00am to 5:00pm PST | 1-877-477-2269 > > [image: support] > > < > https://www.google.com/url?q=https%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNEvwKLjzu29inKwy4jJjKsboqGMCg > > > > [image: twitter] > > < > https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fpivotal&sa=D&sntz=1&usg=AFQjCNG1FcqkH5ghKsSG6UkdeUzjSuDSHg > > > > [image: linkedin] > > < > https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F3048967&sa=D&sntz=1&usg=AFQjCNHOQGYmDYIQz06S3-vAuqzf8bN8Yw > > > > [image: facebook] > > < > https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fpivotalsoftware&sa=D&sntz=1&usg=AFQjCNFQnPFtec1Rp3lKf6MuY1jcbA8j2A > > > > [image: google plus] <https://plus.google.com/+Pivotal> [image: > youtube] > > < > https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl> > > > > > > >