Whilst I agree with "*finish* when we believe the quality of the release
branch is sufficient", I disagree that we should have cut a branch and
continue to patch that branch with non-critical fixes. i.e this issue
has been around for a while and has no averse side effects. Issues like
GEODE-7081, which is new due to a new commit, AND it has critical
stability implications on the server, that I can agree we should include
in a potential release branch.
Otherwise we can ALWAYS argue that said release branch is not of
"sufficient" quality, especially if there are numerous existing JIRA's
pertaining to bugs already in the system.
To quote Juan's original email:
/"Note: *no events are lost (even without the fix)* but, if the region
takes//
//a while to recover, the logs for the member can grow pretty quickly
due to//
//the continuously thrown *NPEs.*"/
In addition to this, if there is a commit in a cut release branch, which
is requiring us to continuously patching the release branch, in order to
stabilize that feature/fix, maybe we should consider reverting that fix
and release it at a later stage, when it is believed that this fix is
more stable and have better, more comprehensive test coverage.
So far, GEODE-7081, does not have me convinced that it is critical. OR
maybe it is the latter of my options, where it is a stabilization commit
to a new feature, which begs the question, should we have accepted the
original feature commit if there are all manner of side effects which we
are only discovering.
--Udo
On 8/15/19 11:08 AM, Anthony Baker wrote:
While we can’t fix *all known bugs*, I think where we do have a fix for an
important issue we should think hard about the cost of not including that in a
release.
IMO, the fixed time approach to releases means that we *start* the release
effort (including stabilization and bug fixing if needed) on a known date and
we *finish* when new believe the quality of the release branch is sufficient.
Given the number of important fixes being requested, I’m not sure we are there
yet.
I think the release branch concept has merit because it allows us to isolate
ongoing work from the changes needed for a release.
+1 for including GEODE-7079.
Anthony
On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <ukohlme...@gmail.com> wrote:
Seems everyone is in favor or including a /*non-critical*/ fix to an already
cut branch of the a potential release...
Am I missing something?
Why cut a release at all... just have a perpetual cycle of fixes added to
develop and users can chose what nightly snapshot build they would want to use..
I'm voting -1 on a non-critical issue, which is existing and worst effect is to
fill logs will NPE logs... (yes, not something we want).
I believed that we (as a Geode community) agreed that once a release has been
cut, only critical issue fixes will be included. If we continue just
continually adding to the ALREADY CUT 1.10 release, where do we stop and when
do we release...
--Udo
On 8/15/19 10:19 AM, Nabarun Nag wrote:
+1
On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann <amurm...@apache.org>
wrote:
+1
Agreed to fixing this. It's impossible for a user to discover they hit an
edge case that we fail to support till they are in prod and restart.
On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <jra...@pivotal.io>
wrote:
Hello Udo,
Even if it is an existing issue I'd still consider it critical for those
cases on which there are unprocessed events on the persistent queue
after a
restart and the region takes long to recover... you can actually see
millions of *NPEs* flooding the member's logs.
My two cents anyway, it's up to the community to make the final decision.
Cheers.
On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <u...@apache.com> wrote:
Juan,
From your explanation, it seems this issue is existing and not
critical. Could we possibly hold this for 1.11?
--Udo
On 8/15/19 5:29 AM, Ju@N wrote:
Hello team,
I'd like to propose including the *fix [1]* for *GEODE-7079 [2]* in
release
1.10.0.
Long story short: a *NullPointerException* can be continuously thrown
and flood the member's logs if a serial event processor (either
*async-event-queue* or *gateway-sender*) starts processing events
from
a
recovered persistent queue before the actual region to which it was
attached is fully operational.
Note: *no events are lost (even without the fix)* but, if the region
takes
a while to recover, the logs for the member can grow pretty quickly
due
to
the continuously thrown *NPEs.*
Best regards.
[1]:
https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b
[2]: https://issues.apache.org/jira/browse/GEODE-7079
--
Juan José Ramos Cassella
Senior Software Engineer
Email: jra...@pivotal.io