Re: Proposal to Include GEODE-7079 in 1.10.0

Udo Kohlmeyer Thu, 15 Aug 2019 12:24:08 -0700

Whilst I agree with "*finish* when we believe the quality of the releasebranch is sufficient", I disagree that we should have cut a branch andcontinue to patch that branch with non-critical fixes. i.e this issuehas been around for a while and has no averse side effects. Issues likeGEODE-7081, which is new due to a new commit, AND it has criticalstability implications on the server, that I can agree we should includein a potential release branch.

Otherwise we can ALWAYS argue that said release branch is not of"sufficient" quality, especially if there are numerous existing JIRA'spertaining to bugs already in the system.


To quote Juan's original email:

/"Note: *no events are lost (even without the fix)* but, if the regiontakes////a while to recover, the logs for the member can grow pretty quicklydue to//

//the continuously thrown *NPEs.*"/

In addition to this, if there is a commit in a cut release branch, whichis requiring us to continuously patching the release branch, in order tostabilize that feature/fix, maybe we should consider reverting that fixand release it at a later stage, when it is believed that this fix ismore stable and have better, more comprehensive test coverage.

So far, GEODE-7081, does not have me convinced that it is critical. ORmaybe it is the latter of my options, where it is a stabilization committo a new feature, which begs the question, should we have accepted theoriginal feature commit if there are all manner of side effects which weare only discovering.


--Udo

On 8/15/19 11:08 AM, Anthony Baker wrote:

While we can’t fix *all known bugs*, I think where we do have a fix for an 
important issue we should think hard about the cost of not including that in a 
release.

IMO, the fixed time approach to releases means that we *start* the release 
effort (including stabilization and bug fixing if needed) on a known date and 
we *finish* when new believe the quality of the release branch is sufficient.  
Given the number of important fixes being requested, I’m not sure we are there 
yet.

I think the release branch concept has merit because it allows us to isolate 
ongoing work from the changes needed for a release.

+1 for including GEODE-7079.

Anthony

On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <[email protected]> wrote:

Seems everyone is in favor or including a /*non-critical*/ fix to an already 
cut branch of the a potential release...

Am I missing something?

Why cut a release at all... just have a perpetual cycle of fixes added to 
develop and users can chose what nightly snapshot build they would want to use..

I'm voting -1 on a non-critical issue, which is existing and worst effect is to 
fill logs will NPE logs... (yes, not something we want).

I believed that we (as a Geode community) agreed that once a release has been 
cut, only critical issue fixes will be included. If we continue just 
continually adding to the ALREADY CUT 1.10 release, where do we stop and when 
do we release...

--Udo

On 8/15/19 10:19 AM, Nabarun Nag wrote:

+1

On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann <[email protected]>
wrote:

+1

Agreed to fixing this. It's impossible for a user to discover they hit an
edge case that we fail to support till they are in prod and restart.

On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <[email protected]>
wrote:

Hello Udo,

Even if it is an existing issue I'd still consider it critical for those
cases on which there are unprocessed events on the persistent queue

after a

restart and the region takes long to recover... you can actually see
millions of *NPEs* flooding the member's logs.
My two cents anyway, it's up to the community to make the final decision.
Cheers.


On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <[email protected]> wrote:

Juan,

  From your explanation, it seems this issue is existing and not
critical. Could we possibly hold this for 1.11?

--Udo

On 8/15/19 5:29 AM, Ju@N wrote:

Hello team,

I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in

release

1.10.0.
Long story short: a *NullPointerException* can be continuously thrown
and flood the member's logs if a serial event processor (either
*async-event-queue* or *gateway-sender*) starts processing events

from

recovered persistent queue before the actual region to which it was
attached is fully operational.
Note: *no events are lost (even without the fix)* but, if the region

takes

a while to recover, the logs  for the member can grow pretty quickly

due

to

the continuously thrown *NPEs.*
Best regards.

[1]:

https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b

[2]: https://issues.apache.org/jira/browse/GEODE-7079

--
Juan José Ramos Cassella
Senior Software Engineer
Email: [email protected]

Re: Proposal to Include GEODE-7079 in 1.10.0

Reply via email to