Hello team, Just following up on my proposal to include GEODE-7079 in release 1.10.0, I think there's enough consensus at this point (+1: 7, +0: 1), can we cherry pick commit *6f4bbbd [1]* into the release branch?. Best regards.
[1]: https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b On Thu, Aug 15, 2019 at 9:17 PM Udo Kohlmeyer <u...@apache.com> wrote: > I'm changing my vote to +1 on this issue. > > The ONLY reason I'm changing my vote is to add to the cleanliness of the > code of the release. I do 100% disagree with the continual scope creep > that we have been incurring on this release branch. > > --Udo > > On 8/15/19 12:34 PM, Dan Smith wrote: > > +1 to merging Juan's fix for GEODE-7079. I've seen systems taken down by > > rapidly filling up the logs in the past, this does seem to be a critical > > fix from the perspective of system stability. > > > > Also, this is the change, which doesn't seem particularly risky to me. > > > > - ConflationKey key = new > > ConflationKey(gsEvent.getRegion().getFullPath(), > > + ConflationKey key = new ConflationKey(gsEvent.getRegionPath(), > > > > -Dan > > > > On Thu, Aug 15, 2019 at 12:23 PM Udo Kohlmeyer <u...@apache.com> wrote: > > > >> Whilst I agree with "*finish* when we believe the quality of the release > >> branch is sufficient", I disagree that we should have cut a branch and > >> continue to patch that branch with non-critical fixes. i.e this issue > >> has been around for a while and has no averse side effects. Issues like > >> GEODE-7081, which is new due to a new commit, AND it has critical > >> stability implications on the server, that I can agree we should include > >> in a potential release branch. > >> > >> Otherwise we can ALWAYS argue that said release branch is not of > >> "sufficient" quality, especially if there are numerous existing JIRA's > >> pertaining to bugs already in the system. > >> > >> To quote Juan's original email: > >> > >> /"Note: *no events are lost (even without the fix)* but, if the region > >> takes// > >> //a while to recover, the logs for the member can grow pretty quickly > >> due to// > >> //the continuously thrown *NPEs.*"/ > >> > >> In addition to this, if there is a commit in a cut release branch, which > >> is requiring us to continuously patching the release branch, in order to > >> stabilize that feature/fix, maybe we should consider reverting that fix > >> and release it at a later stage, when it is believed that this fix is > >> more stable and have better, more comprehensive test coverage. > >> > >> So far, GEODE-7081, does not have me convinced that it is critical. OR > >> maybe it is the latter of my options, where it is a stabilization commit > >> to a new feature, which begs the question, should we have accepted the > >> original feature commit if there are all manner of side effects which we > >> are only discovering. > >> > >> --Udo > >> > >> On 8/15/19 11:08 AM, Anthony Baker wrote: > >>> While we can’t fix *all known bugs*, I think where we do have a fix for > >> an important issue we should think hard about the cost of not including > >> that in a release. > >>> IMO, the fixed time approach to releases means that we *start* the > >> release effort (including stabilization and bug fixing if needed) on a > >> known date and we *finish* when new believe the quality of the release > >> branch is sufficient. Given the number of important fixes being > requested, > >> I’m not sure we are there yet. > >>> I think the release branch concept has merit because it allows us to > >> isolate ongoing work from the changes needed for a release. > >>> +1 for including GEODE-7079. > >>> > >>> Anthony > >>> > >>> > >>>> On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <ukohlme...@gmail.com> > >> wrote: > >>>> Seems everyone is in favor or including a /*non-critical*/ fix to an > >> already cut branch of the a potential release... > >>>> Am I missing something? > >>>> > >>>> Why cut a release at all... just have a perpetual cycle of fixes added > >> to develop and users can chose what nightly snapshot build they would > want > >> to use.. > >>>> I'm voting -1 on a non-critical issue, which is existing and worst > >> effect is to fill logs will NPE logs... (yes, not something we want). > >>>> I believed that we (as a Geode community) agreed that once a release > >> has been cut, only critical issue fixes will be included. If we continue > >> just continually adding to the ALREADY CUT 1.10 release, where do we > stop > >> and when do we release... > >>>> --Udo > >>>> > >>>> On 8/15/19 10:19 AM, Nabarun Nag wrote: > >>>>> +1 > >>>>> > >>>>> On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann < > >> amurm...@apache.org> > >>>>> wrote: > >>>>> > >>>>>> +1 > >>>>>> > >>>>>> Agreed to fixing this. It's impossible for a user to discover they > >> hit an > >>>>>> edge case that we fail to support till they are in prod and restart. > >>>>>> > >>>>>> On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <jra...@pivotal.io > > > >>>>>> wrote: > >>>>>> > >>>>>>> Hello Udo, > >>>>>>> > >>>>>>> Even if it is an existing issue I'd still consider it critical for > >> those > >>>>>>> cases on which there are unprocessed events on the persistent queue > >>>>>> after a > >>>>>>> restart and the region takes long to recover... you can actually > see > >>>>>>> millions of *NPEs* flooding the member's logs. > >>>>>>> My two cents anyway, it's up to the community to make the final > >> decision. > >>>>>>> Cheers. > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <u...@apache.com> > >> wrote: > >>>>>>>> Juan, > >>>>>>>> > >>>>>>>> From your explanation, it seems this issue is existing and not > >>>>>>>> critical. Could we possibly hold this for 1.11? > >>>>>>>> > >>>>>>>> --Udo > >>>>>>>> > >>>>>>>> On 8/15/19 5:29 AM, Ju@N wrote: > >>>>>>>>> Hello team, > >>>>>>>>> > >>>>>>>>> I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* > in > >>>>>>>> release > >>>>>>>>> 1.10.0. > >>>>>>>>> Long story short: a *NullPointerException* can be continuously > >> thrown > >>>>>>>>> and flood the member's logs if a serial event processor (either > >>>>>>>>> *async-event-queue* or *gateway-sender*) starts processing events > >>>>>> from > >>>>>>> a > >>>>>>>>> recovered persistent queue before the actual region to which it > was > >>>>>>>>> attached is fully operational. > >>>>>>>>> Note: *no events are lost (even without the fix)* but, if the > >> region > >>>>>>>> takes > >>>>>>>>> a while to recover, the logs for the member can grow pretty > >> quickly > >>>>>>> due > >>>>>>>> to > >>>>>>>>> the continuously thrown *NPEs.* > >>>>>>>>> Best regards. > >>>>>>>>> > >>>>>>>>> [1]: > >>>>>>>>> > >> > https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b > >>>>>>>>> [2]: https://issues.apache.org/jira/browse/GEODE-7079 > >>>>>>>>> > >>>>>>> -- > >>>>>>> Juan José Ramos Cassella > >>>>>>> Senior Software Engineer > >>>>>>> Email: jra...@pivotal.io > >>>>>>> > -- Juan José Ramos Cassella Senior Software Engineer Email: jra...@pivotal.io