Some additional info/context from the PR that is blocked by this issue:

Although we have GFSH stop, it can still be used on an individual node.  We
just publish a caution, but it looks like we still allowed it due to having
some users using it:

CAUTION: Use caution with the stop gateway-sender command (or equivalent
GatewaySender.stop() API) on parallel gateway senders. Instead of stopping
an individual parallel gateway sender on a member, we recommend shutting
down the entire member to ensure that proper failover of partition region
events to other gateway sender members. Using this command on an individual
parallel gateway sender can occur in event loss. See Stopping Gateway
Senders for more details.

There were some issues with the PR(https://github.com/apache/geode/pull/4387)
when close is implemented.  It doesn't allow a single sender to be shut
down on a node.  I do know of some users that rely on this behavior,
whether they should be able to or not, they have used this in the past
(which is why we added the test
shuttingOneSenderInAVMShouldNotAffectOthersBatchRemovalThread)

The close in combination with stopping gateways senders can cause odd
issues, like PartitionedOfflineExceptions, RegionDestroyedExceptions or
behavior like this test is exhibiting. We have some internal applications
that are running into these types of issues with this diff as well.



On Mon, Jan 27, 2020 at 10:09 AM Dan Smith <dsm...@pivotal.io> wrote:

> Hi Mario,
>
> That bug number is from an old version of GemFire before it was open
> sourced as geode.
>
> Looking at some of the old bug info, it looks like the bug had to do with
> the fact that calling stop on the region was causing there to be unexpected
> RegionDestroyedException's to be thrown when the queue was stopped *on one
> member*. Now that we have "gfsh stop" to stop the queue everywhere, it's
> not clear to me that closing the region would be a problem - it seems like
> the right thing to do if that will make the behavior more consistent with
> serial senders.
>
> -Dan
>
> On Fri, Jan 24, 2020 at 2:39 AM Mario Ivanac <mario.iva...@est.tech>
> wrote:
>
> > Hi geode dev,
> >
> > Do you know more info regarding this bug  49060,
> > because I think this the cause of issue
> > https://issues.apache.org/jira/browse/GEODE-7441.
> >
> > When closing of region is returned (at stoping of parallel GW sender),
> > persistent parallel GW sender queue is restored after restart.
> >
> > BR,
> > Mario
> > ________________________________
> > Å alje: Mario Ivanac
> > Poslano: 11. studenog 2019. 13:29
> > Prima: dev@geode.apache.org <dev@geode.apache.org>
> > Predmet: ParallelGatewaySenderQueue implementation
> >
> > Hi geode dev,
> >
> > I am investigating SerialGatewaySenderQueue and
> ParallelGatewaySenderQueue
> > implementation,
> >
> > and I found that in ParallelGatewaySenderQueue.close() function,
> > code is deleted and comment is left:
> >
> > // Because of bug 49060 do not close the regions of a parallel queue
> >
> > My question is, where can I find some more info regarding this bug.
> >
> > BR,
> > Mario
> >
>

Reply via email to