Re: [DISCUSS] RFC: Shipping Geode patch releases
Thanks for proposing this, Anthony! > I don’t think it’s necessary for this proposal to re-define or clarify > what constitutes a critical fix; it sounds like the bar would be the same > as the standard we already apply when back-porting to release branches > (proposal /w justification, and 3 votes in favor). The only difference > seems to be that now proposals may list up to three target branches, not > just one. > re: Owen TL;DR: +1 using the same process as we use for merging critical fixes during an ongoing release seems appropriate. Generally merging a fix to a dormant release branch seems less problematic than merging a fix to an active release branch where a merge will reset all release work that has already happened. The cost of merging to a dormant release branch is much lower than merging to one that's being actively released. Ideally we could just do a PR to merge fixes back in most cases. Unfortunately, I believe it's unreasonable to expect that everyone will be aware at all times what's actively being released and what's not => Let's pretend we are always shipping these branches. On Fri, Feb 21, 2020 at 7:35 PM Owen Nichols wrote: > Thank you Anthony for proposing this “N-2” support policy. It isn’t a big > change, but it is helpful to know that the Geode PMC will now be standing > behind (and ready to vote on) patch releases within a 9-month window. > > Overall, this sounds much like how 1.9.1 and 1.9.2 started as community > proposals, found a release manager, and went on to be successfully released. > > I don’t think it’s necessary for this proposal to re-define or clarify > what constitutes a critical fix; it sounds like the bar would be the same > as the standard we already apply when back-porting to release branches > (proposal /w justification, and 3 votes in favor). The only difference > seems to be that now proposals may list up to three target branches, not > just one. > > I also don’t think it’s necessary to alter our current process to maintain > a standing "support/x.y" branch. The proposal states that patch releases > will be “ad-hoc (as needed)”. Our current process serves this quite well: > we propose a patch release at the time it is needed, then get a release > manager and create a release branch specifically for that release (e.g. > release/1.9.2 was created from the rel/v1.9.1 tag), then clean up > afterwards so no unattended pipelines or branches linger. > > The rotating release manager role has been a hallmark of the Geode > community process, so I hope this proposal will not dissuade anyone > interested in helping with a release. However, getting the automation > improvements we need will require some continuity over several releases. I > would love to volunteer for this! > > -Owen > > > On Feb 21, 2020, at 5:30 PM, Anthony Baker wrote: > > > > Hi everyone, > > > > I'd like to propose shipping patch releases of Geode as a way to > > improve our engagement and support of our user community. Please > > review the details in the linked RFC [1] and kindly offer your > > thoughts in this thread. > > > > > > Thanks, > > Anthony > > > > [1] > https://cwiki.apache.org/confluence/display/GEODE/Shipping+patch+releases > >
Re: [DISCUSS] RFC: Shipping Geode patch releases
Hi Alexander, currently we don’t start a patch release until someone proposes a critical fix, which then drives the release (the community may propose “extra” fixes to tag along once a release branch is cut). This keeps the process simple, neat and tidy. Another option I hadn’t thought of is to begin collecting “extra” fixes proactively on a “dormant” branch, so that when someone finally proposes releasing a patch, it will already be primed with a bunch of fixes. This adds complexity (does a different standard apply to bring fixes to a dormant branch? Are release branches separate from support branches? How will committers be able to keep track of what is dormant and what is not?) To implement an N-2 support policy, does it make more sense for Geode to make small focused patch releases when needed, or to maintain what amounts to “3 develop branches at all times”? > On Feb 24, 2020, at 11:00 AM, Alexander Murmann wrote: > > Thanks for proposing this, Anthony! > > >> I don’t think it’s necessary for this proposal to re-define or clarify >> what constitutes a critical fix; it sounds like the bar would be the same >> as the standard we already apply when back-porting to release branches >> (proposal /w justification, and 3 votes in favor). The only difference >> seems to be that now proposals may list up to three target branches, not >> just one. >> > re: Owen > TL;DR: +1 using the same process as we use for merging critical fixes > during an ongoing release seems appropriate. > > Generally merging a fix to a dormant release branch seems less problematic > than merging a fix to an active release branch where a merge will reset all > release work that has already happened. The cost of merging to a > dormant release branch is much lower than merging to one that's being > actively released. Ideally we could just do a PR to merge fixes back in > most cases. Unfortunately, I believe it's unreasonable to expect that > everyone will be aware at all times what's actively being released and > what's not => Let's pretend we are always shipping these branches. > > > > > On Fri, Feb 21, 2020 at 7:35 PM Owen Nichols wrote: > >> Thank you Anthony for proposing this “N-2” support policy. It isn’t a big >> change, but it is helpful to know that the Geode PMC will now be standing >> behind (and ready to vote on) patch releases within a 9-month window. >> >> Overall, this sounds much like how 1.9.1 and 1.9.2 started as community >> proposals, found a release manager, and went on to be successfully released. >> >> I don’t think it’s necessary for this proposal to re-define or clarify >> what constitutes a critical fix; it sounds like the bar would be the same >> as the standard we already apply when back-porting to release branches >> (proposal /w justification, and 3 votes in favor). The only difference >> seems to be that now proposals may list up to three target branches, not >> just one. >> >> I also don’t think it’s necessary to alter our current process to maintain >> a standing "support/x.y" branch. The proposal states that patch releases >> will be “ad-hoc (as needed)”. Our current process serves this quite well: >> we propose a patch release at the time it is needed, then get a release >> manager and create a release branch specifically for that release (e.g. >> release/1.9.2 was created from the rel/v1.9.1 tag), then clean up >> afterwards so no unattended pipelines or branches linger. >> >> The rotating release manager role has been a hallmark of the Geode >> community process, so I hope this proposal will not dissuade anyone >> interested in helping with a release. However, getting the automation >> improvements we need will require some continuity over several releases. I >> would love to volunteer for this! >> >> -Owen >> >>> On Feb 21, 2020, at 5:30 PM, Anthony Baker wrote: >>> >>> Hi everyone, >>> >>> I'd like to propose shipping patch releases of Geode as a way to >>> improve our engagement and support of our user community. Please >>> review the details in the linked RFC [1] and kindly offer your >>> thoughts in this thread. >>> >>> >>> Thanks, >>> Anthony >>> >>> [1] >> https://cwiki.apache.org/confluence/display/GEODE/Shipping+patch+releases >> >>
Re: [DISCUSS] Prevent locator startup if startup/restart thread throws an uncaught exception
I think what you're proposing is very reasonable. LoggingThread was added fairly recently to replace our use of java.util.ThreadGroup.uncaughtException(Thread t, Throwable e) which was the old way to do things. I think logging the Throwable at highest severity was more of a last barrier of defense rather than a purposeful design -- so there might not be any ready answers to your questions. Later features that have been added to Geode have Executors or Threads that aren't part of the original group of Executors that are in ClusterDistributionManager. Some of these newer Executors and Threads might be a little loose in the error handling department. -Kirk On Fri, Feb 21, 2020 at 10:42 AM Dale Emery wrote: > I would like to consider preventing locator startup if a startup or > restart thread throws an uncaught exception. Otherwise, the cluster can > include a locator that lacks critical services. We have created > https://issues.apache.org/jira/browse/GEODE-7775 < > https://issues.apache.org/jira/browse/GEODE-7775> to address this. > > We recently observed a serious problem in a user's Geode cluster. The > problem was enabled by a restart thread's policy of catching uncaught > exceptions, logging them as "fatal," then exiting the thread without > further action. > > Here's how the problem happened: > > The cluster had 3 locators and 4 servers. An NPE occurred in the "Location > services restart thread" while a locator was restarting. The thread logged > the NPE and exited, having never started the configuration persistence > service. This incomplete locator then joined the cluster. > > The user then issued numerous gfsh commands to create, destroy, and > re-create regions, routing each gfsh command to a different locator in > round-robin fashion. > > Approximately a third of the commands were executed via the incomplete > locator. Though the commands properly created or destroyed the regions, > these results were never recorded in the persisted configuration. As a > result, the persisted configuration was missing definitions for a third of > the regions, and had duplicate or even triplicate definitions for others. > > When the user tried to restart a server, the server detected that the > persisted configuration was invalid and refused to start. > > We have fixed the NPE that initially triggered the problem. > > We still have a vulnerability: If in the future a startup/restart thread > suffers some other exception before it finishes starting its services, the > thread will log it and exit, allowing the incomplete locator to join the > cluster. > > Some things I don't know: > - What was the reason for instituting the LoggingThread's policy of > logging exceptions as "fatal" and otherwise ignoring them? > - In which threads should uncaught exceptions prevent startup? > - In which threads should uncaught exceptions be logged and ignored? > > Cheers, > Dale > > — > Dale Emery > dem...@pivotal.io > dem...@vmware.com > >