Whilst I agree with "*finish* when we believe the quality of the release branch is sufficient", I disagree that we should have cut a branch and continue to patch that branch with non-critical fixes. i.e this issue has been around for a while and has no averse side effects. Issues like GEODE-7081, which is new due to a new commit, AND it has critical stability implications on the server, that I can agree we should include in a potential release branch.

Otherwise we can ALWAYS argue that said release branch is not of "sufficient" quality, especially if there are numerous existing JIRA's pertaining to bugs already in the system.

To quote Juan's original email:

/"Note: *no events are lost (even without the fix)* but, if the region takes// //a while to recover, the logs  for the member can grow pretty quickly due to//
//the continuously thrown *NPEs.*"/

In addition to this, if there is a commit in a cut release branch, which is requiring us to continuously patching the release branch, in order to stabilize that feature/fix, maybe we should consider reverting that fix and release it at a later stage, when it is believed that this fix is more stable and have better, more comprehensive test coverage.

So far, GEODE-7081, does not have me convinced that it is critical. OR maybe it is the latter of my options, where it is a stabilization commit to a new feature, which begs the question, should we have accepted the original feature commit if there are all manner of side effects which we are only discovering.

--Udo

On 8/15/19 11:08 AM, Anthony Baker wrote:
While we can’t fix *all known bugs*, I think where we do have a fix for an 
important issue we should think hard about the cost of not including that in a 
release.

IMO, the fixed time approach to releases means that we *start* the release 
effort (including stabilization and bug fixing if needed) on a known date and 
we *finish* when new believe the quality of the release branch is sufficient.  
Given the number of important fixes being requested, I’m not sure we are there 
yet.

I think the release branch concept has merit because it allows us to isolate 
ongoing work from the changes needed for a release.

+1 for including GEODE-7079.

Anthony


On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <ukohlme...@gmail.com> wrote:

Seems everyone is in favor or including a /*non-critical*/ fix to an already 
cut branch of the a potential release...

Am I missing something?

Why cut a release at all... just have a perpetual cycle of fixes added to 
develop and users can chose what nightly snapshot build they would want to use..

I'm voting -1 on a non-critical issue, which is existing and worst effect is to 
fill logs will NPE logs... (yes, not something we want).

I believed that we (as a Geode community) agreed that once a release has been 
cut, only critical issue fixes will be included. If we continue just 
continually adding to the ALREADY CUT 1.10 release, where do we stop and when 
do we release...

--Udo

On 8/15/19 10:19 AM, Nabarun Nag wrote:
+1

On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann <amurm...@apache.org>
wrote:

+1

Agreed to fixing this. It's impossible for a user to discover they hit an
edge case that we fail to support till they are in prod and restart.

On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <jra...@pivotal.io>
wrote:

Hello Udo,

Even if it is an existing issue I'd still consider it critical for those
cases on which there are unprocessed events on the persistent queue
after a
restart and the region takes long to recover... you can actually see
millions of *NPEs* flooding the member's logs.
My two cents anyway, it's up to the community to make the final decision.
Cheers.


On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <u...@apache.com> wrote:

Juan,

  From your explanation, it seems this issue is existing and not
critical. Could we possibly hold this for 1.11?

--Udo

On 8/15/19 5:29 AM, Ju@N wrote:
Hello team,

I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in
release
1.10.0.
Long story short: a *NullPointerException* can be continuously thrown
and flood the member's logs if a serial event processor (either
*async-event-queue* or *gateway-sender*) starts processing events
from
a
recovered persistent queue before the actual region to which it was
attached is fully operational.
Note: *no events are lost (even without the fix)* but, if the region
takes
a while to recover, the logs  for the member can grow pretty quickly
due
to
the continuously thrown *NPEs.*
Best regards.

[1]:

https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b
[2]: https://issues.apache.org/jira/browse/GEODE-7079

--
Juan José Ramos Cassella
Senior Software Engineer
Email: jra...@pivotal.io

Reply via email to