Reviewers for GEODE-7971 Gateway sender to deliver transaction events atomically to gateway receivers

2020-04-14 Thread Alberto Gomez
Hi,

Could I ask for a review on https://github.com/apache/geode/pull/4928?

This PR is about "Gateway sender to deliver transaction events atomically to 
gateway receivers" (https://issues.apache.org/jira/browse/GEODE-7971).


Thanks,

/Alberto G.



[Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Mark Hanson
Hi All,

I know that we have discussed this once before, but I think it bears repeating. 
We have test code that assumes cache.close is synchronous. It is not. Not even 
close. I would like discuss some possible changes.

Option 1. Call it what it is.  Deprecate Cache.close and create a new method 
called asyncClose to replace it. Simple and descriptive. 
Option 2. Fix cache close so it is synchronous. Some might say that we are 
going to break behavior, but I doubt any user relies on the fact that it is 
asynchronous. That would be dangerous in and of itself. Obviously, we don’t 
want to change behavior, but there have been a number of distributed tests that 
have failed for this. If internal to the code devs don’t get it right, where 
does that leave users.
Option 3. Status quo.

What do you think? Are there other options I am missing?

Thanks,
Mark



Re: [Discuss] Cache.close is not synchronous, but code still expects it to be....

2020-04-14 Thread Mark Hanson
Corrected subject line...

> On Apr 14, 2020, at 2:11 PM, Mark Hanson  wrote:
> 
> Hi All,
> 
> I know that we have discussed this once before, but I think it bears 
> repeating. We have test code that assumes cache.close is synchronous. It is 
> not. Not even close. I would like discuss some possible changes.
> 
> Option 1. Call it what it is.  Deprecate Cache.close and create a new method 
> called asyncClose to replace it. Simple and descriptive. 
> Option 2. Fix cache close so it is synchronous. Some might say that we are 
> going to break behavior, but I doubt any user relies on the fact that it is 
> asynchronous. That would be dangerous in and of itself. Obviously, we don’t 
> want to change behavior, but there have been a number of distributed tests 
> that have failed for this. If internal to the code devs don’t get it right, 
> where does that leave users.
> Option 3. Status quo.
> 
> What do you think? Are there other options I am missing?
> 
> Thanks,
> Mark
> 



Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Jacob Barrett
Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and waits 
until it is really closed.
Option 5: Cache.close(Runnable closedCalleback) - Runs callback after cache is 
really close.
Option 6: cache.close(); while (!cache.isClosed());


> On Apr 14, 2020, at 2:11 PM, Mark Hanson  wrote:
> 
> Hi All,
> 
> I know that we have discussed this once before, but I think it bears 
> repeating. We have test code that assumes cache.close is synchronous. It is 
> not. Not even close. I would like discuss some possible changes.
> 
> Option 1. Call it what it is.  Deprecate Cache.close and create a new method 
> called asyncClose to replace it. Simple and descriptive. 
> Option 2. Fix cache close so it is synchronous. Some might say that we are 
> going to break behavior, but I doubt any user relies on the fact that it is 
> asynchronous. That would be dangerous in and of itself. Obviously, we don’t 
> want to change behavior, but there have been a number of distributed tests 
> that have failed for this. If internal to the code devs don’t get it right, 
> where does that leave users.
> Option 3. Status quo.
> 
> What do you think? Are there other options I am missing?
> 
> Thanks,
> Mark
> 



Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Mark Hanson
Hi Jake,

For Option 6: We could fix isClosed as well. That is a great suggestion. 
Currently, it returns almost immediately.
I like your options though

Any other thoughts?

Any preferences? It think any of the current options seem better than the 
current situation as long as we fix isClosed.

Thanks,
Mark

From: Jacob Barrett 
Sent: Tuesday, April 14, 2020 2:30 PM
To: dev@geode.apache.org 
Subject: Re: [Discuss] Cache.close synchronous is not synchronous, but code 
still expects it to be

Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and waits 
until it is really closed.
Option 5: Cache.close(Runnable closedCalleback) - Runs callback after cache is 
really close.
Option 6: cache.close(); while (!cache.isClosed());


> On Apr 14, 2020, at 2:11 PM, Mark Hanson  wrote:
>
> Hi All,
>
> I know that we have discussed this once before, but I think it bears 
> repeating. We have test code that assumes cache.close is synchronous. It is 
> not. Not even close. I would like discuss some possible changes.
>
> Option 1. Call it what it is.  Deprecate Cache.close and create a new method 
> called asyncClose to replace it. Simple and descriptive.
> Option 2. Fix cache close so it is synchronous. Some might say that we are 
> going to break behavior, but I doubt any user relies on the fact that it is 
> asynchronous. That would be dangerous in and of itself. Obviously, we don’t 
> want to change behavior, but there have been a number of distributed tests 
> that have failed for this. If internal to the code devs don’t get it right, 
> where does that leave users.
> Option 3. Status quo.
>
> What do you think? Are there other options I am missing?
>
> Thanks,
> Mark
>



Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Dan Smith
Can you spell out what parts of Cache.close are asynchronous? As far as I
can tell it shuts down threadpools, etc. synchronously.

-Dan

On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson  wrote:

> Hi Jake,
>
> For Option 6: We could fix isClosed as well. That is a great suggestion.
> Currently, it returns almost immediately.
> I like your options though
>
> Any other thoughts?
>
> Any preferences? It think any of the current options seem better than the
> current situation as long as we fix isClosed.
>
> Thanks,
> Mark
> 
> From: Jacob Barrett 
> Sent: Tuesday, April 14, 2020 2:30 PM
> To: dev@geode.apache.org 
> Subject: Re: [Discuss] Cache.close synchronous is not synchronous, but
> code still expects it to be
>
> Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and
> waits until it is really closed.
> Option 5: Cache.close(Runnable closedCalleback) - Runs callback after
> cache is really close.
> Option 6: cache.close(); while (!cache.isClosed());
>
>
> > On Apr 14, 2020, at 2:11 PM, Mark Hanson  wrote:
> >
> > Hi All,
> >
> > I know that we have discussed this once before, but I think it bears
> repeating. We have test code that assumes cache.close is synchronous. It is
> not. Not even close. I would like discuss some possible changes.
> >
> > Option 1. Call it what it is.  Deprecate Cache.close and create a new
> method called asyncClose to replace it. Simple and descriptive.
> > Option 2. Fix cache close so it is synchronous. Some might say that we
> are going to break behavior, but I doubt any user relies on the fact that
> it is asynchronous. That would be dangerous in and of itself. Obviously, we
> don’t want to change behavior, but there have been a number of distributed
> tests that have failed for this. If internal to the code devs don’t get it
> right, where does that leave users.
> > Option 3. Status quo.
> >
> > What do you think? Are there other options I am missing?
> >
> > Thanks,
> > Mark
> >
>
>


Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Jason Huynh
The isClosed flag and method might be used currently more as an isClosing.
The GemFireCacheImpl.isClosed() method is actually returning isClosing.
Whatever change to isClosed that will be made, will have to properly handle
cases where it's meant to be treated as isClosing().

On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson  wrote:

> Hi Jake,
>
> For Option 6: We could fix isClosed as well. That is a great suggestion.
> Currently, it returns almost immediately.
> I like your options though
>
> Any other thoughts?
>
> Any preferences? It think any of the current options seem better than the
> current situation as long as we fix isClosed.
>
> Thanks,
> Mark
> 
> From: Jacob Barrett 
> Sent: Tuesday, April 14, 2020 2:30 PM
> To: dev@geode.apache.org 
> Subject: Re: [Discuss] Cache.close synchronous is not synchronous, but
> code still expects it to be
>
> Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and
> waits until it is really closed.
> Option 5: Cache.close(Runnable closedCalleback) - Runs callback after
> cache is really close.
> Option 6: cache.close(); while (!cache.isClosed());
>
>
> > On Apr 14, 2020, at 2:11 PM, Mark Hanson  wrote:
> >
> > Hi All,
> >
> > I know that we have discussed this once before, but I think it bears
> repeating. We have test code that assumes cache.close is synchronous. It is
> not. Not even close. I would like discuss some possible changes.
> >
> > Option 1. Call it what it is.  Deprecate Cache.close and create a new
> method called asyncClose to replace it. Simple and descriptive.
> > Option 2. Fix cache close so it is synchronous. Some might say that we
> are going to break behavior, but I doubt any user relies on the fact that
> it is asynchronous. That would be dangerous in and of itself. Obviously, we
> don’t want to change behavior, but there have been a number of distributed
> tests that have failed for this. If internal to the code devs don’t get it
> right, where does that leave users.
> > Option 3. Status quo.
> >
> > What do you think? Are there other options I am missing?
> >
> > Thanks,
> > Mark
> >
>
>


Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread John Blum
My first thought is cache close (i.e. RegionService.close()  should be
synchronous (by default), perhaps, with non-blocking options or options to
wait for a set timeout as Jake suggested.

This is a problem for *Integration Tests* (that start a peer cache
instance, in-process or standalone) in general and not simply just
"distributed" tests!  This is the reason I built support for this in *Spring
Test for Apache Geode* (STDG) since subsequent tests requiring a peer cache
instance (or CacheServer) may conflict with each other, especially given 1)
the cache instance is a *Singleton* and 2) it is ideal to not have to
restart the JVM between, even for *Integration Tests*, however, turns out
to be a really common practice. *#ugh*  However, without proper
coordination this can be a real problem, hence STDG's support.  Even when
forking JVMs, you still have to be careful to wait in certain cases, as you
could run into other conflicts (e.g. BindExceptions if not varying port
numbers and such).  For all these reasons and more, it is important that
the cache has fully shutdown and released all its resources.

Also, while we are on this topic, I think it would be useful to have a
dedicated interface for the cache instance lifecycle.  It's unfortunate the
CacheListener interface is already taken for Region events. *#sigh*

The interface might be something like...

interface CacheLifecycleListener {

  default void isStarting(CacheEvent event) {}

  default void onStart(CacheEvent event) {}

  default void isClosing(CacheEvent event) {}

  default void onClose(CacheEvent event) {}

  ...

}

-j

On Tue, Apr 14, 2020 at 3:21 PM Jason Huynh  wrote:

> The isClosed flag and method might be used currently more as an isClosing.
> The GemFireCacheImpl.isClosed() method is actually returning isClosing.
> Whatever change to isClosed that will be made, will have to properly handle
> cases where it's meant to be treated as isClosing().
>
> On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson  wrote:
>
> > Hi Jake,
> >
> > For Option 6: We could fix isClosed as well. That is a great suggestion.
> > Currently, it returns almost immediately.
> > I like your options though
> >
> > Any other thoughts?
> >
> > Any preferences? It think any of the current options seem better than the
> > current situation as long as we fix isClosed.
> >
> > Thanks,
> > Mark
> > 
> > From: Jacob Barrett 
> > Sent: Tuesday, April 14, 2020 2:30 PM
> > To: dev@geode.apache.org 
> > Subject: Re: [Discuss] Cache.close synchronous is not synchronous, but
> > code still expects it to be
> >
> > Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and
> > waits until it is really closed.
> > Option 5: Cache.close(Runnable closedCalleback) - Runs callback after
> > cache is really close.
> > Option 6: cache.close(); while (!cache.isClosed());
> >
> >
> > > On Apr 14, 2020, at 2:11 PM, Mark Hanson  wrote:
> > >
> > > Hi All,
> > >
> > > I know that we have discussed this once before, but I think it bears
> > repeating. We have test code that assumes cache.close is synchronous. It
> is
> > not. Not even close. I would like discuss some possible changes.
> > >
> > > Option 1. Call it what it is.  Deprecate Cache.close and create a new
> > method called asyncClose to replace it. Simple and descriptive.
> > > Option 2. Fix cache close so it is synchronous. Some might say that we
> > are going to break behavior, but I doubt any user relies on the fact that
> > it is asynchronous. That would be dangerous in and of itself. Obviously,
> we
> > don’t want to change behavior, but there have been a number of
> distributed
> > tests that have failed for this. If internal to the code devs don’t get
> it
> > right, where does that leave users.
> > > Option 3. Status quo.
> > >
> > > What do you think? Are there other options I am missing?
> > >
> > > Thanks,
> > > Mark
> > >
> >
> >
>


-- 
-John
Spring Data Team


Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Dan Smith
IMO if it's not currently synchronous, that's just a bug that needs to be
fixed. If folks can provide details on what actually was asynchronous or
the particular test failures they saw, that would be helpful.

Previously, when this came up it looks like Kirk found that close would not
wait for a different call to close() issued by a different thread [1]. Is
that still the bug we are running into? One that thread, it seems like we
also had a consensus we should just fix bugs with Cache.close:

-Dan

1.
https://lists.apache.org/x/thread.html/f385a6dd51209e2706c68c9821412a6f22ebef3e809636060ac0bf55@%3Cdev.geode.apache.org%3E

On Tue, Apr 14, 2020 at 3:23 PM John Blum  wrote:

> My first thought is cache close (i.e. RegionService.close()  should be
> synchronous (by default), perhaps, with non-blocking options or options to
> wait for a set timeout as Jake suggested.
>
> This is a problem for *Integration Tests* (that start a peer cache
> instance, in-process or standalone) in general and not simply just
> "distributed" tests!  This is the reason I built support for this in
> *Spring
> Test for Apache Geode* (STDG) since subsequent tests requiring a peer cache
> instance (or CacheServer) may conflict with each other, especially given 1)
> the cache instance is a *Singleton* and 2) it is ideal to not have to
> restart the JVM between, even for *Integration Tests*, however, turns out
> to be a really common practice. *#ugh*  However, without proper
> coordination this can be a real problem, hence STDG's support.  Even when
> forking JVMs, you still have to be careful to wait in certain cases, as you
> could run into other conflicts (e.g. BindExceptions if not varying port
> numbers and such).  For all these reasons and more, it is important that
> the cache has fully shutdown and released all its resources.
>
> Also, while we are on this topic, I think it would be useful to have a
> dedicated interface for the cache instance lifecycle.  It's unfortunate the
> CacheListener interface is already taken for Region events. *#sigh*
>
> The interface might be something like...
>
> interface CacheLifecycleListener {
>
>   default void isStarting(CacheEvent event) {}
>
>   default void onStart(CacheEvent event) {}
>
>   default void isClosing(CacheEvent event) {}
>
>   default void onClose(CacheEvent event) {}
>
>   ...
>
> }
>
> -j
>
> On Tue, Apr 14, 2020 at 3:21 PM Jason Huynh  wrote:
>
> > The isClosed flag and method might be used currently more as an
> isClosing.
> > The GemFireCacheImpl.isClosed() method is actually returning isClosing.
> > Whatever change to isClosed that will be made, will have to properly
> handle
> > cases where it's meant to be treated as isClosing().
> >
> > On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson  wrote:
> >
> > > Hi Jake,
> > >
> > > For Option 6: We could fix isClosed as well. That is a great
> suggestion.
> > > Currently, it returns almost immediately.
> > > I like your options though
> > >
> > > Any other thoughts?
> > >
> > > Any preferences? It think any of the current options seem better than
> the
> > > current situation as long as we fix isClosed.
> > >
> > > Thanks,
> > > Mark
> > > 
> > > From: Jacob Barrett 
> > > Sent: Tuesday, April 14, 2020 2:30 PM
> > > To: dev@geode.apache.org 
> > > Subject: Re: [Discuss] Cache.close synchronous is not synchronous, but
> > > code still expects it to be
> > >
> > > Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and
> > > waits until it is really closed.
> > > Option 5: Cache.close(Runnable closedCalleback) - Runs callback after
> > > cache is really close.
> > > Option 6: cache.close(); while (!cache.isClosed());
> > >
> > >
> > > > On Apr 14, 2020, at 2:11 PM, Mark Hanson  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I know that we have discussed this once before, but I think it bears
> > > repeating. We have test code that assumes cache.close is synchronous.
> It
> > is
> > > not. Not even close. I would like discuss some possible changes.
> > > >
> > > > Option 1. Call it what it is.  Deprecate Cache.close and create a new
> > > method called asyncClose to replace it. Simple and descriptive.
> > > > Option 2. Fix cache close so it is synchronous. Some might say that
> we
> > > are going to break behavior, but I doubt any user relies on the fact
> that
> > > it is asynchronous. That would be dangerous in and of itself.
> Obviously,
> > we
> > > don’t want to change behavior, but there have been a number of
> > distributed
> > > tests that have failed for this. If internal to the code devs don’t get
> > it
> > > right, where does that leave users.
> > > > Option 3. Status quo.
> > > >
> > > > What do you think? Are there other options I am missing?
> > > >
> > > > Thanks,
> > > > Mark
> > > >
> > >
> > >
> >
>
>
> --
> -John
> Spring Data Team
>


Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Mark Hanson
I believe it is because of disk persistence among other things. Kirk would know 
for sure. He fixed the issue and his PR got shutdown.
I totally support just fixing the bug.

Let’s give Kirk a chance to chime in.

Thanks,
Mark

> On Apr 14, 2020, at 3:39 PM, Dan Smith  wrote:
> 
> IMO if it's not currently synchronous, that's just a bug that needs to be
> fixed. If folks can provide details on what actually was asynchronous or
> the particular test failures they saw, that would be helpful.
> 
> Previously, when this came up it looks like Kirk found that close would not
> wait for a different call to close() issued by a different thread [1]. Is
> that still the bug we are running into? One that thread, it seems like we
> also had a consensus we should just fix bugs with Cache.close:
> 
> -Dan
> 
> 1.
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Ff385a6dd51209e2706c68c9821412a6f22ebef3e809636060ac0bf55%40%253Cdev.geode.apache.org%253E&data=02%7C01%7Chansonm%40vmware.com%7C7a43463ab53c416234d908d7e0c4cc6b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637225008165230328&sdata=GD77kAubDDWfP93zjYsNP61VMN4%2FKBAHcq95GwjeMBc%3D&reserved=0
> 
> On Tue, Apr 14, 2020 at 3:23 PM John Blum  wrote:
> 
>> My first thought is cache close (i.e. RegionService.close()  should be
>> synchronous (by default), perhaps, with non-blocking options or options to
>> wait for a set timeout as Jake suggested.
>> 
>> This is a problem for *Integration Tests* (that start a peer cache
>> instance, in-process or standalone) in general and not simply just
>> "distributed" tests!  This is the reason I built support for this in
>> *Spring
>> Test for Apache Geode* (STDG) since subsequent tests requiring a peer cache
>> instance (or CacheServer) may conflict with each other, especially given 1)
>> the cache instance is a *Singleton* and 2) it is ideal to not have to
>> restart the JVM between, even for *Integration Tests*, however, turns out
>> to be a really common practice. *#ugh*  However, without proper
>> coordination this can be a real problem, hence STDG's support.  Even when
>> forking JVMs, you still have to be careful to wait in certain cases, as you
>> could run into other conflicts (e.g. BindExceptions if not varying port
>> numbers and such).  For all these reasons and more, it is important that
>> the cache has fully shutdown and released all its resources.
>> 
>> Also, while we are on this topic, I think it would be useful to have a
>> dedicated interface for the cache instance lifecycle.  It's unfortunate the
>> CacheListener interface is already taken for Region events. *#sigh*
>> 
>> The interface might be something like...
>> 
>> interface CacheLifecycleListener {
>> 
>>  default void isStarting(CacheEvent event) {}
>> 
>>  default void onStart(CacheEvent event) {}
>> 
>>  default void isClosing(CacheEvent event) {}
>> 
>>  default void onClose(CacheEvent event) {}
>> 
>>  ...
>> 
>> }
>> 
>> -j
>> 
>> On Tue, Apr 14, 2020 at 3:21 PM Jason Huynh  wrote:
>> 
>>> The isClosed flag and method might be used currently more as an
>> isClosing.
>>> The GemFireCacheImpl.isClosed() method is actually returning isClosing.
>>> Whatever change to isClosed that will be made, will have to properly
>> handle
>>> cases where it's meant to be treated as isClosing().
>>> 
>>> On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson  wrote:
>>> 
 Hi Jake,
 
 For Option 6: We could fix isClosed as well. That is a great
>> suggestion.
 Currently, it returns almost immediately.
 I like your options though
 
 Any other thoughts?
 
 Any preferences? It think any of the current options seem better than
>> the
 current situation as long as we fix isClosed.
 
 Thanks,
 Mark
 
 From: Jacob Barrett 
 Sent: Tuesday, April 14, 2020 2:30 PM
 To: dev@geode.apache.org 
 Subject: Re: [Discuss] Cache.close synchronous is not synchronous, but
 code still expects it to be
 
 Option 4: Cache.closeAndWait(long timeout, TimeUnit unit) - Closes and
 waits until it is really closed.
 Option 5: Cache.close(Runnable closedCalleback) - Runs callback after
 cache is really close.
 Option 6: cache.close(); while (!cache.isClosed());
 
 
> On Apr 14, 2020, at 2:11 PM, Mark Hanson  wrote:
> 
> Hi All,
> 
> I know that we have discussed this once before, but I think it bears
 repeating. We have test code that assumes cache.close is synchronous.
>> It
>>> is
 not. Not even close. I would like discuss some possible changes.
> 
> Option 1. Call it what it is.  Deprecate Cache.close and create a new
 method called asyncClose to replace it. Simple and descriptive.
> Option 2. Fix cache close so it is synchronous. Some might say that
>> we
 are going to break behavior, but I doubt any user relies on the fact
>> that
 it is asynchronous. Th

Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread John Blum
Among other problems I encountered, 1 problem that seemed to affect
*Integration
Tests* as I described was that the *Singleton* cache reference was not
cleaned up in a timely manner. Therefore, starting a fresh cache instance
(without coordination) in another *Integration Tests* would occasionally
throw a CacheExistsException (IIRC).

It would be roughly equivalent to ...

Cache cache = new CacheFactory().create();
// create more cache objects (Regions, Indexes, etc)
cache.close();
cache = new CacheFactory().create();  // EXCEPTION!!!

There is a lot of stuff (even asynchronous things) going on inside cache
close that can take time.  Even deallocation of system resources does not
always happen in a timely manner.

Kirk will undoubtedly remember other things he encountered as well.

-j


On Tue, Apr 14, 2020 at 3:45 PM Mark Hanson  wrote:

> I believe it is because of disk persistence among other things. Kirk would
> know for sure. He fixed the issue and his PR got shutdown.
> I totally support just fixing the bug.
>
> Let’s give Kirk a chance to chime in.
>
> Thanks,
> Mark
>
> > On Apr 14, 2020, at 3:39 PM, Dan Smith  wrote:
> >
> > IMO if it's not currently synchronous, that's just a bug that needs to be
> > fixed. If folks can provide details on what actually was asynchronous or
> > the particular test failures they saw, that would be helpful.
> >
> > Previously, when this came up it looks like Kirk found that close would
> not
> > wait for a different call to close() issued by a different thread [1]. Is
> > that still the bug we are running into? One that thread, it seems like we
> > also had a consensus we should just fix bugs with Cache.close:
> >
> > -Dan
> >
> > 1.
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Ff385a6dd51209e2706c68c9821412a6f22ebef3e809636060ac0bf55%40%253Cdev.geode.apache.org%253E&data=02%7C01%7Chansonm%40vmware.com%7C7a43463ab53c416234d908d7e0c4cc6b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637225008165230328&sdata=GD77kAubDDWfP93zjYsNP61VMN4%2FKBAHcq95GwjeMBc%3D&reserved=0
> >
> > On Tue, Apr 14, 2020 at 3:23 PM John Blum  wrote:
> >
> >> My first thought is cache close (i.e. RegionService.close()  should be
> >> synchronous (by default), perhaps, with non-blocking options or options
> to
> >> wait for a set timeout as Jake suggested.
> >>
> >> This is a problem for *Integration Tests* (that start a peer cache
> >> instance, in-process or standalone) in general and not simply just
> >> "distributed" tests!  This is the reason I built support for this in
> >> *Spring
> >> Test for Apache Geode* (STDG) since subsequent tests requiring a peer
> cache
> >> instance (or CacheServer) may conflict with each other, especially
> given 1)
> >> the cache instance is a *Singleton* and 2) it is ideal to not have to
> >> restart the JVM between, even for *Integration Tests*, however, turns
> out
> >> to be a really common practice. *#ugh*  However, without proper
> >> coordination this can be a real problem, hence STDG's support.  Even
> when
> >> forking JVMs, you still have to be careful to wait in certain cases, as
> you
> >> could run into other conflicts (e.g. BindExceptions if not varying port
> >> numbers and such).  For all these reasons and more, it is important that
> >> the cache has fully shutdown and released all its resources.
> >>
> >> Also, while we are on this topic, I think it would be useful to have a
> >> dedicated interface for the cache instance lifecycle.  It's unfortunate
> the
> >> CacheListener interface is already taken for Region events. *#sigh*
> >>
> >> The interface might be something like...
> >>
> >> interface CacheLifecycleListener {
> >>
> >>  default void isStarting(CacheEvent event) {}
> >>
> >>  default void onStart(CacheEvent event) {}
> >>
> >>  default void isClosing(CacheEvent event) {}
> >>
> >>  default void onClose(CacheEvent event) {}
> >>
> >>  ...
> >>
> >> }
> >>
> >> -j
> >>
> >> On Tue, Apr 14, 2020 at 3:21 PM Jason Huynh  wrote:
> >>
> >>> The isClosed flag and method might be used currently more as an
> >> isClosing.
> >>> The GemFireCacheImpl.isClosed() method is actually returning isClosing.
> >>> Whatever change to isClosed that will be made, will have to properly
> >> handle
> >>> cases where it's meant to be treated as isClosing().
> >>>
> >>> On Tue, Apr 14, 2020 at 3:09 PM Mark Hanson 
> wrote:
> >>>
>  Hi Jake,
> 
>  For Option 6: We could fix isClosed as well. That is a great
> >> suggestion.
>  Currently, it returns almost immediately.
>  I like your options though
> 
>  Any other thoughts?
> 
>  Any preferences? It think any of the current options seem better than
> >> the
>  current situation as long as we fix isClosed.
> 
>  Thanks,
>  Mark
>  
>  From: Jacob Barrett 
>  Sent: Tuesday, April 14, 2020 2:30 PM
>  To: dev@geode.apache.org 
>  Subject: Re: [Dis

Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Jacob Barrett


> On Apr 14, 2020, at 3:09 PM, Mark Hanson  wrote:
> 
> Any other thoughts?

Don’t encourage me!

Option 7: remove all singletons!

> Any preferences? It think any of the current options seem better than the 
> current situation as long as we fix isClosed.

Option 7, 2 or 4 in that order.



Re: [Discuss] Cache.close synchronous is not synchronous, but code still expects it to be....

2020-04-14 Thread Kirk Lund
When I previously submitted a PR to change concurrent calls to
Cache.close() to not return until everything is completely stopped
(including the DistributedSystem), the change was down-voted by Anil. His
reasoning was that the current behavior is expected by users and is de
facto correct. Changing it would change the behavior and result in
long-time users being surprised by the new behavior.

Below are some stack traces from my notes about what causes the thread
calling Cache.close() to return prematurely before the DistributedSystem is
disconnected. Thread #3 (junit main thread) invokes Cache.close() as seen
in stack #3 below. DiskStoreImpl reacts to the situation by once again
invoking Cache.close() once or twice (stack #1 and stack #2) and one of
those invocations wins the race against the thread in stack #3. The thread
in stack #3 then returns prematurely before the DistributedSystem is
disconnected. If thread #3 then attempts to do anything like create a new
cache (which quite a few tests do), it can fail and throw
DistributedSystemDisconnectedException from cache create.

There are two early-outs in GemFireCacheImpl.close() which allows a calling
thread to return before any actual work has been completed after closing
nothing but the SecurityService.

if (isClosed()) {
  return;
}

And "isClosed()" returns true when isClosing flag is true (which is set
true when closing starts):

  public boolean isClosed() {
return isClosing;
  }

Failed creation of a persistent region is one way DiskStoreImpl can cause
multiple threads trying to close the cache to trip all over each other.

DiskStoreImpl is problematic at best in the way it's implemented and it's
not currently unit tested (or unit testable without lots of refactoring),
and I don't plan to revisit this change. I would however be happy to review
proposals and PRs related to this. My change was focused on Cache.close()
and adding a CountDownLatch to close() -- perhaps the next attempt to "fix"
this should focus on DiskStoreImpl -- one could easily argue that closing
the cache is NOT a valid responsibility for DiskStoreImpl. But changing the
behavior of the persistent "layer" of Geode might require more research
(and a LOT more refactoring) than I had time for since my reason for
working on this was to fix some flaky dunit tests caused by this race
condition.

This bug appears to be caused when creation of a persistent region fails
and DiskStoreImpl.lambda$handleDiskAccessException forks a new Thread to
close the Cache which succeeds in closing the Cache before the main test
thread closes it. The main test thread then early outs because the
DiskStore thread is already closing the Cache. The main test thread then
tries to create a Cache which collides with the DiskStore thread which is
still closing the Cache and DistributedSystem.

java.lang.Throwable: KIRK GemFireCacheImpl closed 632230948
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2365)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1917)
at 
org.apache.geode.internal.cache.DiskStoreImpl.lambda$handleDiskAccessException$2(DiskStoreImpl.java:3380)
at java.lang.Thread.run(Thread.java:748)

java.lang.Throwable: KIRK InternalDistributedSystem closed 306674056
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1637)
at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1225)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2351)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1917)
at 
org.apache.geode.internal.cache.DiskStoreImpl.lambda$handleDiskAccessException$2(DiskStoreImpl.java:3380)
at java.lang.Thread.run(Thread.java:748)

java.lang.Throwable: KIRK DiskStoreImpl closing cache 1555793073
at 
org.apache.geode.internal.cache.DiskStoreImpl.handleDiskAccessException(DiskStoreImpl.java:3376)
at 
org.apache.geode.internal.cache.PartitionedRegion.createAndValidatePersistentConfig(PartitionedRegion.java:956)
at 
org.apache.geode.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:999)
at 
org.apache.geode.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1179)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2931)
at 
org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2918)
at org.apache.geode.cache.RegionFactory.create(RegionFactory.java:755)
at 
org.apache.geode.cache.CacheFactoryRecreateRegressionTest.createCacheAndColocatedPRs(CacheFactoryRecreateRegressionTest.java:109)
at 
o

Unable to get behavior described in documentation when using durable native client

2020-04-14 Thread Jakov Varenina

Hi all,

Could you please help me understand behavior of the native client when 
configured as durable?


I have been working on a bug GEODE-7944 
 which results with 
exception "Unable to deserialize membership id java.io.EOFException" on 
locator only when debug is enabled. This happens because native client, 
only when subscription is enabled, sends towards locator 
QueueConnectionRequest that doesn't encapsulate ClientProxyMembershipID 
(not properly serialized) and therefore exception occurs when locator 
tries to deserialize membership id to log it at debug level.


I was trying to figure out why would locator need 
ClientProxyMembershipID from native client and found following paragraph 
in the documentation (copied from 
https://geode.apache.org/docs/geode-native/cpp/112/connection-pools/subscription-properties.html): 



   /For durable subscriptions, the server locator must be able to
   locate the servers that host the queues for the durable client. When
   a durable client sends a request, the server locator queries all the
   available servers to see if they are hosting the subscription region
   queue for the durable client. If the server is located, the client
   is connected to the server hosting the subscription region queue./

Locator behaves as described in above paragraph only when it receives 
///QueueConnectionRequest with ///findDurable flag set to "true" //and 
with valid membership i//d. //I noticed that unlike java client, the 
native client always sets //findDurable// to //"false" //and therefore 
locator will never behave as described in above paragraph when native 
client is used.


Does anybody know why native client always sets //findDurable=false//?

BRs,

Jakov