There appears to be consensus that this is a critical fix. The following commit has been brought into release/1.10.0 <https://github.com/apache/geode/tree/release/1.10.0> as the critical fix for GEODE-7079 <https://issues.apache.org/jira/browse/GEODE-7079>:
git cherry-pick -x 6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b <https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b> GEODE-7079 <https://issues.apache.org/jira/browse/GEODE-7079> has been changed to 'resolved in' 1.10.0. -Owen > On Aug 19, 2019, at 2:39 AM, Juan José Ramos <jra...@pivotal.io> wrote: > > Hello team, > > Just following up on my proposal to include GEODE-7079 in release 1.10.0, I > think there's enough consensus at this point (+1: 7, +0: 1), can we cherry > pick commit *6f4bbbd [1]* into the release branch?. > Best regards. > > > [1]: > https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b > > On Thu, Aug 15, 2019 at 9:17 PM Udo Kohlmeyer <u...@apache.com> wrote: > >> I'm changing my vote to +1 on this issue. >> >> The ONLY reason I'm changing my vote is to add to the cleanliness of the >> code of the release. I do 100% disagree with the continual scope creep >> that we have been incurring on this release branch. >> >> --Udo >> >> On 8/15/19 12:34 PM, Dan Smith wrote: >>> +1 to merging Juan's fix for GEODE-7079. I've seen systems taken down by >>> rapidly filling up the logs in the past, this does seem to be a critical >>> fix from the perspective of system stability. >>> >>> Also, this is the change, which doesn't seem particularly risky to me. >>> >>> - ConflationKey key = new >>> ConflationKey(gsEvent.getRegion().getFullPath(), >>> + ConflationKey key = new ConflationKey(gsEvent.getRegionPath(), >>> >>> -Dan >>> >>> On Thu, Aug 15, 2019 at 12:23 PM Udo Kohlmeyer <u...@apache.com> wrote: >>> >>>> Whilst I agree with "*finish* when we believe the quality of the release >>>> branch is sufficient", I disagree that we should have cut a branch and >>>> continue to patch that branch with non-critical fixes. i.e this issue >>>> has been around for a while and has no averse side effects. Issues like >>>> GEODE-7081, which is new due to a new commit, AND it has critical >>>> stability implications on the server, that I can agree we should include >>>> in a potential release branch. >>>> >>>> Otherwise we can ALWAYS argue that said release branch is not of >>>> "sufficient" quality, especially if there are numerous existing JIRA's >>>> pertaining to bugs already in the system. >>>> >>>> To quote Juan's original email: >>>> >>>> /"Note: *no events are lost (even without the fix)* but, if the region >>>> takes// >>>> //a while to recover, the logs for the member can grow pretty quickly >>>> due to// >>>> //the continuously thrown *NPEs.*"/ >>>> >>>> In addition to this, if there is a commit in a cut release branch, which >>>> is requiring us to continuously patching the release branch, in order to >>>> stabilize that feature/fix, maybe we should consider reverting that fix >>>> and release it at a later stage, when it is believed that this fix is >>>> more stable and have better, more comprehensive test coverage. >>>> >>>> So far, GEODE-7081, does not have me convinced that it is critical. OR >>>> maybe it is the latter of my options, where it is a stabilization commit >>>> to a new feature, which begs the question, should we have accepted the >>>> original feature commit if there are all manner of side effects which we >>>> are only discovering. >>>> >>>> --Udo >>>> >>>> On 8/15/19 11:08 AM, Anthony Baker wrote: >>>>> While we can’t fix *all known bugs*, I think where we do have a fix for >>>> an important issue we should think hard about the cost of not including >>>> that in a release. >>>>> IMO, the fixed time approach to releases means that we *start* the >>>> release effort (including stabilization and bug fixing if needed) on a >>>> known date and we *finish* when new believe the quality of the release >>>> branch is sufficient. Given the number of important fixes being >> requested, >>>> I’m not sure we are there yet. >>>>> I think the release branch concept has merit because it allows us to >>>> isolate ongoing work from the changes needed for a release. >>>>> +1 for including GEODE-7079. >>>>> >>>>> Anthony >>>>> >>>>> >>>>>> On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <ukohlme...@gmail.com> >>>> wrote: >>>>>> Seems everyone is in favor or including a /*non-critical*/ fix to an >>>> already cut branch of the a potential release... >>>>>> Am I missing something? >>>>>> >>>>>> Why cut a release at all... just have a perpetual cycle of fixes added >>>> to develop and users can chose what nightly snapshot build they would >> want >>>> to use.. >>>>>> I'm voting -1 on a non-critical issue, which is existing and worst >>>> effect is to fill logs will NPE logs... (yes, not something we want). >>>>>> I believed that we (as a Geode community) agreed that once a release >>>> has been cut, only critical issue fixes will be included. If we continue >>>> just continually adding to the ALREADY CUT 1.10 release, where do we >> stop >>>> and when do we release... >>>>>> --Udo >>>>>> >>>>>> On 8/15/19 10:19 AM, Nabarun Nag wrote: >>>>>>> +1 >>>>>>> >>>>>>> On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann < >>>> amurm...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> Agreed to fixing this. It's impossible for a user to discover they >>>> hit an >>>>>>>> edge case that we fail to support till they are in prod and restart. >>>>>>>> >>>>>>>> On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <jra...@pivotal.io >>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello Udo, >>>>>>>>> >>>>>>>>> Even if it is an existing issue I'd still consider it critical for >>>> those >>>>>>>>> cases on which there are unprocessed events on the persistent queue >>>>>>>> after a >>>>>>>>> restart and the region takes long to recover... you can actually >> see >>>>>>>>> millions of *NPEs* flooding the member's logs. >>>>>>>>> My two cents anyway, it's up to the community to make the final >>>> decision. >>>>>>>>> Cheers. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <u...@apache.com> >>>> wrote: >>>>>>>>>> Juan, >>>>>>>>>> >>>>>>>>>> From your explanation, it seems this issue is existing and not >>>>>>>>>> critical. Could we possibly hold this for 1.11? >>>>>>>>>> >>>>>>>>>> --Udo >>>>>>>>>> >>>>>>>>>> On 8/15/19 5:29 AM, Ju@N wrote: >>>>>>>>>>> Hello team, >>>>>>>>>>> >>>>>>>>>>> I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* >> in >>>>>>>>>> release >>>>>>>>>>> 1.10.0. >>>>>>>>>>> Long story short: a *NullPointerException* can be continuously >>>> thrown >>>>>>>>>>> and flood the member's logs if a serial event processor (either >>>>>>>>>>> *async-event-queue* or *gateway-sender*) starts processing events >>>>>>>> from >>>>>>>>> a >>>>>>>>>>> recovered persistent queue before the actual region to which it >> was >>>>>>>>>>> attached is fully operational. >>>>>>>>>>> Note: *no events are lost (even without the fix)* but, if the >>>> region >>>>>>>>>> takes >>>>>>>>>>> a while to recover, the logs for the member can grow pretty >>>> quickly >>>>>>>>> due >>>>>>>>>> to >>>>>>>>>>> the continuously thrown *NPEs.* >>>>>>>>>>> Best regards. >>>>>>>>>>> >>>>>>>>>>> [1]: >>>>>>>>>>> >>>> >> https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b >>>>>>>>>>> [2]: https://issues.apache.org/jira/browse/GEODE-7079 >>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Juan José Ramos Cassella >>>>>>>>> Senior Software Engineer >>>>>>>>> Email: jra...@pivotal.io >>>>>>>>> >> > > > -- > Juan José Ramos Cassella > Senior Software Engineer > Email: jra...@pivotal.io