Re: Proposal to Include GEODE-7079 in 1.10.0

Owen Nichols Mon, 19 Aug 2019 07:45:15 -0700

There appears to be consensus that this is a critical fix.

The following commit has been brought into release/1.10.0 
<https://github.com/apache/geode/tree/release/1.10.0> as the critical fix for 
GEODE-7079 <https://issues.apache.org/jira/browse/GEODE-7079>:


git cherry-pick -x 6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b 
<https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b>

GEODE-7079 <https://issues.apache.org/jira/browse/GEODE-7079> has been changed 
to 'resolved in' 1.10.0.

-Owen

> On Aug 19, 2019, at 2:39 AM, Juan José Ramos <[email protected]> wrote:
> 
> Hello team,
> 
> Just following up on my proposal to include GEODE-7079 in release 1.10.0, I
> think there's enough consensus at this point (+1: 7, +0: 1), can we cherry
> pick commit *6f4bbbd [1]* into the release branch?.
> Best regards.
> 
> 
> [1]:
> https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b
> 
> On Thu, Aug 15, 2019 at 9:17 PM Udo Kohlmeyer <[email protected]> wrote:
> 
>> I'm changing my vote to +1 on this issue.
>> 
>> The ONLY reason I'm changing my vote is to add to the cleanliness of the
>> code of the release. I do 100% disagree with the continual scope creep
>> that we have been incurring on this release branch.
>> 
>> --Udo
>> 
>> On 8/15/19 12:34 PM, Dan Smith wrote:
>>> +1 to merging Juan's fix for GEODE-7079. I've seen systems taken down by
>>> rapidly filling up the logs in the past, this does seem to be a critical
>>> fix from the perspective of system stability.
>>> 
>>> Also, this is the change, which doesn't seem particularly risky to me.
>>> 
>>> -          ConflationKey key = new
>>> ConflationKey(gsEvent.getRegion().getFullPath(),
>>> +          ConflationKey key = new ConflationKey(gsEvent.getRegionPath(),
>>> 
>>> -Dan
>>> 
>>> On Thu, Aug 15, 2019 at 12:23 PM Udo Kohlmeyer <[email protected]> wrote:
>>> 
>>>> Whilst I agree with "*finish* when we believe the quality of the release
>>>> branch is sufficient", I disagree that we should have cut a branch and
>>>> continue to patch that branch with non-critical fixes. i.e this issue
>>>> has been around for a while and has no averse side effects. Issues like
>>>> GEODE-7081, which is new due to a new commit, AND it has critical
>>>> stability implications on the server, that I can agree we should include
>>>> in a potential release branch.
>>>> 
>>>> Otherwise we can ALWAYS argue that said release branch is not of
>>>> "sufficient" quality, especially if there are numerous existing JIRA's
>>>> pertaining to bugs already in the system.
>>>> 
>>>> To quote Juan's original email:
>>>> 
>>>> /"Note: *no events are lost (even without the fix)* but, if the region
>>>> takes//
>>>> //a while to recover, the logs  for the member can grow pretty quickly
>>>> due to//
>>>> //the continuously thrown *NPEs.*"/
>>>> 
>>>> In addition to this, if there is a commit in a cut release branch, which
>>>> is requiring us to continuously patching the release branch, in order to
>>>> stabilize that feature/fix, maybe we should consider reverting that fix
>>>> and release it at a later stage, when it is believed that this fix is
>>>> more stable and have better, more comprehensive test coverage.
>>>> 
>>>> So far, GEODE-7081, does not have me convinced that it is critical. OR
>>>> maybe it is the latter of my options, where it is a stabilization commit
>>>> to a new feature, which begs the question, should we have accepted the
>>>> original feature commit if there are all manner of side effects which we
>>>> are only discovering.
>>>> 
>>>> --Udo
>>>> 
>>>> On 8/15/19 11:08 AM, Anthony Baker wrote:
>>>>> While we can’t fix *all known bugs*, I think where we do have a fix for
>>>> an important issue we should think hard about the cost of not including
>>>> that in a release.
>>>>> IMO, the fixed time approach to releases means that we *start* the
>>>> release effort (including stabilization and bug fixing if needed) on a
>>>> known date and we *finish* when new believe the quality of the release
>>>> branch is sufficient.  Given the number of important fixes being
>> requested,
>>>> I’m not sure we are there yet.
>>>>> I think the release branch concept has merit because it allows us to
>>>> isolate ongoing work from the changes needed for a release.
>>>>> +1 for including GEODE-7079.
>>>>> 
>>>>> Anthony
>>>>> 
>>>>> 
>>>>>> On Aug 15, 2019, at 10:51 AM, Udo Kohlmeyer <[email protected]>
>>>> wrote:
>>>>>> Seems everyone is in favor or including a /*non-critical*/ fix to an
>>>> already cut branch of the a potential release...
>>>>>> Am I missing something?
>>>>>> 
>>>>>> Why cut a release at all... just have a perpetual cycle of fixes added
>>>> to develop and users can chose what nightly snapshot build they would
>> want
>>>> to use..
>>>>>> I'm voting -1 on a non-critical issue, which is existing and worst
>>>> effect is to fill logs will NPE logs... (yes, not something we want).
>>>>>> I believed that we (as a Geode community) agreed that once a release
>>>> has been cut, only critical issue fixes will be included. If we continue
>>>> just continually adding to the ALREADY CUT 1.10 release, where do we
>> stop
>>>> and when do we release...
>>>>>> --Udo
>>>>>> 
>>>>>> On 8/15/19 10:19 AM, Nabarun Nag wrote:
>>>>>>> +1
>>>>>>> 
>>>>>>> On Thu, Aug 15, 2019 at 10:15 AM Alexander Murmann <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> Agreed to fixing this. It's impossible for a user to discover they
>>>> hit an
>>>>>>>> edge case that we fail to support till they are in prod and restart.
>>>>>>>> 
>>>>>>>> On Thu, Aug 15, 2019 at 10:09 AM Juan José Ramos <[email protected]
>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hello Udo,
>>>>>>>>> 
>>>>>>>>> Even if it is an existing issue I'd still consider it critical for
>>>> those
>>>>>>>>> cases on which there are unprocessed events on the persistent queue
>>>>>>>> after a
>>>>>>>>> restart and the region takes long to recover... you can actually
>> see
>>>>>>>>> millions of *NPEs* flooding the member's logs.
>>>>>>>>> My two cents anyway, it's up to the community to make the final
>>>> decision.
>>>>>>>>> Cheers.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Aug 15, 2019 at 5:58 PM Udo Kohlmeyer <[email protected]>
>>>> wrote:
>>>>>>>>>> Juan,
>>>>>>>>>> 
>>>>>>>>>>   From your explanation, it seems this issue is existing and not
>>>>>>>>>> critical. Could we possibly hold this for 1.11?
>>>>>>>>>> 
>>>>>>>>>> --Udo
>>>>>>>>>> 
>>>>>>>>>> On 8/15/19 5:29 AM, Ju@N wrote:
>>>>>>>>>>> Hello team,
>>>>>>>>>>> 
>>>>>>>>>>> I'd like to propose including the *fix [1]* for *GEODE-7079 [2]*
>> in
>>>>>>>>>> release
>>>>>>>>>>> 1.10.0.
>>>>>>>>>>> Long story short: a *NullPointerException* can be continuously
>>>> thrown
>>>>>>>>>>> and flood the member's logs if a serial event processor (either
>>>>>>>>>>> *async-event-queue* or *gateway-sender*) starts processing events
>>>>>>>> from
>>>>>>>>> a
>>>>>>>>>>> recovered persistent queue before the actual region to which it
>> was
>>>>>>>>>>> attached is fully operational.
>>>>>>>>>>> Note: *no events are lost (even without the fix)* but, if the
>>>> region
>>>>>>>>>> takes
>>>>>>>>>>> a while to recover, the logs  for the member can grow pretty
>>>> quickly
>>>>>>>>> due
>>>>>>>>>> to
>>>>>>>>>>> the continuously thrown *NPEs.*
>>>>>>>>>>> Best regards.
>>>>>>>>>>> 
>>>>>>>>>>> [1]:
>>>>>>>>>>> 
>>>> 
>> https://github.com/apache/geode/commit/6f4bbbd96bcecdb82cf7753ce1dae9fa6baebf9b
>>>>>>>>>>> [2]: https://issues.apache.org/jira/browse/GEODE-7079
>>>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Juan José Ramos Cassella
>>>>>>>>> Senior Software Engineer
>>>>>>>>> Email: [email protected]
>>>>>>>>> 
>> 
> 
> 
> -- 
> Juan José Ramos Cassella
> Senior Software Engineer
> Email: [email protected]

Re: Proposal to Include GEODE-7079 in 1.10.0

Reply via email to