Re: Proposal to Include GEODE-7079 in 1.10.0

Juan José Ramos Mon, 19 Aug 2019 02:40:46 -0700

Hello team,

Just following up on my proposal to include GEODE-7079 in release 1.10.0, I
think there's enough consensus at this point (+1: 7, +0: 1), can we cherry
pick commit *6f4bbbd [1]* into the release branch?.
Best regards.



[1]:
https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b

On Thu, Aug 15, 2019 at 9:17 PM Udo Kohlmeyer <u...@apache.com> wrote:

> I'm changing my vote to +1 on this issue.
>
> The ONLY reason I'm changing my vote is to add to the cleanliness of the
> code of the release. I do 100% disagree with the continual scope creep
> that we have been incurring on this release branch.
>
> --Udo
>
> On 8/15/19 12:34 PM, Dan Smith wrote:
> > +1 to merging Juan's fix for GEODE-7079. I've seen systems taken down by
> > rapidly filling up the logs in the past, this does seem to be a critical
> > fix from the perspective of system stability.
> >
> > Also, this is the change, which doesn't seem particularly risky to me.
> >
> > -          ConflationKey key = new
> > ConflationKey(gsEvent.getRegion().getFullPath(),
> > +          ConflationKey key = new ConflationKey(gsEvent.getRegionPath(),
> >
> > -Dan
> >
> > On Thu, Aug 15, 2019 at 12:23 PM Udo Kohlmeyer <u...@apache.com> wrote:
> >
> >> Whilst I agree with "*finish* when we believe the quality of the release
> >> branch is sufficient", I disagree that we should have cut a branch and
> >> continue to patch that branch with non-critical fixes. i.e this issue
> >> has been around for a while and has no averse side effects. Issues like
> >> GEODE-7081, which is new due to a new commit, AND it has critical
> >> stability implications on the server, that I can agree we should include
> >> in a potential release branch.
> >>
> >> Otherwise we can ALWAYS argue that said release branch is not of
> >> "sufficient" quality, especially if there are numerous existing JIRA's
> >> pertaining to bugs already in the system.
> >>
> >> To quote Juan's original email:
> >>
> >> /"Note: *no events are lost (even without the fix)* but, if the region
> >> takes//
> >> //a while to recover, the logs  for the member can grow pretty quickly
> >> due to//
> >> //the continuously thrown *NPEs.*"/
> >>
> >> In addition to this, if there is a commit in a cut release branch, which
> >> is requiring us to continuously patching the release branch, in order to
> >> stabilize that feature/fix, maybe we should consider reverting that fix
> >> and release it at a later stage, when it is believed that this fix is
> >> more stable and have better, more comprehensive test coverage.
> >>
> >> So far, GEODE-7081, does not have me convinced that it is critical. OR
> >> maybe it is the latter of my options, where it is a stabilization commit
> >> to a new feature, which begs the question, should we have accepted the
> >> original feature commit if there are all manner of side effects which we
> >> are only discovering.
> >>
> >> --Udo
> >>
> >> On 8/15/19 11:08 AM, Anthony Baker wrote:
> >>> While we can’t fix *all known bugs*, I think where we do have a fix for
> >> an important issue we should think hard about the cost of not including
> >> that in a release.
> >>> IMO, the fixed time approach to releases means that we *start* the
> >> release effort (including stabilization and bug fixing if needed) on a
> >> known date and we *finish* when new believe the quality of the release
> >> branch is sufficient.  Given the number of important fixes being
> requested,
> >> I’m not sure we are there yet.
> >>> I think the release branch concept has merit because it allows us to
> >> isolate ongoing work from the changes needed for a release.
> >>> +1 for including GEODE-7079.
> >>>
> >>> Anthony
> >>>
> >>>
> >>>> On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <ukohlme...@gmail.com>
> >> wrote:
> >>>> Seems everyone is in favor or including a /*non-critical*/ fix to an
> >> already cut branch of the a potential release...
> >>>> Am I missing something?
> >>>>
> >>>> Why cut a release at all... just have a perpetual cycle of fixes added
> >> to develop and users can chose what nightly snapshot build they would
> want
> >> to use..
> >>>> I'm voting -1 on a non-critical issue, which is existing and worst
> >> effect is to fill logs will NPE logs... (yes, not something we want).
> >>>> I believed that we (as a Geode community) agreed that once a release
> >> has been cut, only critical issue fixes will be included. If we continue
> >> just continually adding to the ALREADY CUT 1.10 release, where do we
> stop
> >> and when do we release...
> >>>> --Udo
> >>>>
> >>>> On 8/15/19 10:19 AM, Nabarun Nag wrote:
> >>>>> +1
> >>>>>
> >>>>> On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann <
> >> amurm...@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> Agreed to fixing this. It's impossible for a user to discover they
> >> hit an
> >>>>>> edge case that we fail to support till they are in prod and restart.
> >>>>>>
> >>>>>> On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <jra...@pivotal.io
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello Udo,
> >>>>>>>
> >>>>>>> Even if it is an existing issue I'd still consider it critical for
> >> those
> >>>>>>> cases on which there are unprocessed events on the persistent queue
> >>>>>> after a
> >>>>>>> restart and the region takes long to recover... you can actually
> see
> >>>>>>> millions of *NPEs* flooding the member's logs.
> >>>>>>> My two cents anyway, it's up to the community to make the final
> >> decision.
> >>>>>>> Cheers.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <u...@apache.com>
> >> wrote:
> >>>>>>>> Juan,
> >>>>>>>>
> >>>>>>>>    From your explanation, it seems this issue is existing and not
> >>>>>>>> critical. Could we possibly hold this for 1.11?
> >>>>>>>>
> >>>>>>>> --Udo
> >>>>>>>>
> >>>>>>>> On 8/15/19 5:29 AM, Ju@N wrote:
> >>>>>>>>> Hello team,
> >>>>>>>>>
> >>>>>>>>> I'd like to propose including the *fix [1]* for *GEODE-7079 [2]*
> in
> >>>>>>>> release
> >>>>>>>>> 1.10.0.
> >>>>>>>>> Long story short: a *NullPointerException* can be continuously
> >> thrown
> >>>>>>>>> and flood the member's logs if a serial event processor (either
> >>>>>>>>> *async-event-queue* or *gateway-sender*) starts processing events
> >>>>>> from
> >>>>>>> a
> >>>>>>>>> recovered persistent queue before the actual region to which it
> was
> >>>>>>>>> attached is fully operational.
> >>>>>>>>> Note: *no events are lost (even without the fix)* but, if the
> >> region
> >>>>>>>> takes
> >>>>>>>>> a while to recover, the logs  for the member can grow pretty
> >> quickly
> >>>>>>> due
> >>>>>>>> to
> >>>>>>>>> the continuously thrown *NPEs.*
> >>>>>>>>> Best regards.
> >>>>>>>>>
> >>>>>>>>> [1]:
> >>>>>>>>>
> >>
> https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b
> >>>>>>>>> [2]: https://issues.apache.org/jira/browse/GEODE-7079
> >>>>>>>>>
> >>>>>>> --
> >>>>>>> Juan José Ramos Cassella
> >>>>>>> Senior Software Engineer
> >>>>>>> Email: jra...@pivotal.io
> >>>>>>>
>


-- 
Juan José Ramos Cassella
Senior Software Engineer
Email: jra...@pivotal.io

Re: Proposal to Include GEODE-7079 in 1.10.0

Reply via email to