+1 to merging Juan's fix for GEODE-7079. I've seen systems taken down by
rapidly filling up the logs in the past, this does seem to be a critical
fix from the perspective of system stability.

Also, this is the change, which doesn't seem particularly risky to me.

-          ConflationKey key = new
ConflationKey(gsEvent.getRegion().getFullPath(),
+          ConflationKey key = new ConflationKey(gsEvent.getRegionPath(),

-Dan

On Thu, Aug 15, 2019 at 12:23 PM Udo Kohlmeyer <u...@apache.com> wrote:

> Whilst I agree with "*finish* when we believe the quality of the release
> branch is sufficient", I disagree that we should have cut a branch and
> continue to patch that branch with non-critical fixes. i.e this issue
> has been around for a while and has no averse side effects. Issues like
> GEODE-7081, which is new due to a new commit, AND it has critical
> stability implications on the server, that I can agree we should include
> in a potential release branch.
>
> Otherwise we can ALWAYS argue that said release branch is not of
> "sufficient" quality, especially if there are numerous existing JIRA's
> pertaining to bugs already in the system.
>
> To quote Juan's original email:
>
> /"Note: *no events are lost (even without the fix)* but, if the region
> takes//
> //a while to recover, the logs  for the member can grow pretty quickly
> due to//
> //the continuously thrown *NPEs.*"/
>
> In addition to this, if there is a commit in a cut release branch, which
> is requiring us to continuously patching the release branch, in order to
> stabilize that feature/fix, maybe we should consider reverting that fix
> and release it at a later stage, when it is believed that this fix is
> more stable and have better, more comprehensive test coverage.
>
> So far, GEODE-7081, does not have me convinced that it is critical. OR
> maybe it is the latter of my options, where it is a stabilization commit
> to a new feature, which begs the question, should we have accepted the
> original feature commit if there are all manner of side effects which we
> are only discovering.
>
> --Udo
>
> On 8/15/19 11:08 AM, Anthony Baker wrote:
> > While we can’t fix *all known bugs*, I think where we do have a fix for
> an important issue we should think hard about the cost of not including
> that in a release.
> >
> > IMO, the fixed time approach to releases means that we *start* the
> release effort (including stabilization and bug fixing if needed) on a
> known date and we *finish* when new believe the quality of the release
> branch is sufficient.  Given the number of important fixes being requested,
> I’m not sure we are there yet.
> >
> > I think the release branch concept has merit because it allows us to
> isolate ongoing work from the changes needed for a release.
> >
> > +1 for including GEODE-7079.
> >
> > Anthony
> >
> >
> >> On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <ukohlme...@gmail.com>
> wrote:
> >>
> >> Seems everyone is in favor or including a /*non-critical*/ fix to an
> already cut branch of the a potential release...
> >>
> >> Am I missing something?
> >>
> >> Why cut a release at all... just have a perpetual cycle of fixes added
> to develop and users can chose what nightly snapshot build they would want
> to use..
> >>
> >> I'm voting -1 on a non-critical issue, which is existing and worst
> effect is to fill logs will NPE logs... (yes, not something we want).
> >>
> >> I believed that we (as a Geode community) agreed that once a release
> has been cut, only critical issue fixes will be included. If we continue
> just continually adding to the ALREADY CUT 1.10 release, where do we stop
> and when do we release...
> >>
> >> --Udo
> >>
> >> On 8/15/19 10:19 AM, Nabarun Nag wrote:
> >>> +1
> >>>
> >>> On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann <
> amurm...@apache.org>
> >>> wrote:
> >>>
> >>>> +1
> >>>>
> >>>> Agreed to fixing this. It's impossible for a user to discover they
> hit an
> >>>> edge case that we fail to support till they are in prod and restart.
> >>>>
> >>>> On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <jra...@pivotal.io>
> >>>> wrote:
> >>>>
> >>>>> Hello Udo,
> >>>>>
> >>>>> Even if it is an existing issue I'd still consider it critical for
> those
> >>>>> cases on which there are unprocessed events on the persistent queue
> >>>> after a
> >>>>> restart and the region takes long to recover... you can actually see
> >>>>> millions of *NPEs* flooding the member's logs.
> >>>>> My two cents anyway, it's up to the community to make the final
> decision.
> >>>>> Cheers.
> >>>>>
> >>>>>
> >>>>> On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <u...@apache.com>
> wrote:
> >>>>>
> >>>>>> Juan,
> >>>>>>
> >>>>>>   From your explanation, it seems this issue is existing and not
> >>>>>> critical. Could we possibly hold this for 1.11?
> >>>>>>
> >>>>>> --Udo
> >>>>>>
> >>>>>> On 8/15/19 5:29 AM, Ju@N wrote:
> >>>>>>> Hello team,
> >>>>>>>
> >>>>>>> I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in
> >>>>>> release
> >>>>>>> 1.10.0.
> >>>>>>> Long story short: a *NullPointerException* can be continuously
> thrown
> >>>>>>> and flood the member's logs if a serial event processor (either
> >>>>>>> *async-event-queue* or *gateway-sender*) starts processing events
> >>>> from
> >>>>> a
> >>>>>>> recovered persistent queue before the actual region to which it was
> >>>>>>> attached is fully operational.
> >>>>>>> Note: *no events are lost (even without the fix)* but, if the
> region
> >>>>>> takes
> >>>>>>> a while to recover, the logs  for the member can grow pretty
> quickly
> >>>>> due
> >>>>>> to
> >>>>>>> the continuously thrown *NPEs.*
> >>>>>>> Best regards.
> >>>>>>>
> >>>>>>> [1]:
> >>>>>>>
> >>>>
> https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b
> >>>>>>> [2]: https://issues.apache.org/jira/browse/GEODE-7079
> >>>>>>>
> >>>>> --
> >>>>> Juan José Ramos Cassella
> >>>>> Senior Software Engineer
> >>>>> Email: jra...@pivotal.io
> >>>>>
>

Reply via email to