Clean C++ client metadata in timeouts

2020-09-17 Thread Alberto Bustamante Reyes
Hi geode-dev,

I have a question about the c++ client.

Some months ago we merged GEODE-8231 to solve a problem we observed regarding 
the native client was trying to connect to stopped server.
GEODE-8231 solution consists on remove the client metadata when an "IO error in 
handshake" exception is received. This fix solved most of our problems, but it 
has been observed that sometimes when a server is stopped the errors received 
in the client are not the same and this "IO error in handshake" takes up to a 
minute to appear. So during that time, the client is still trying to connect to 
the offline server.

As the error received during that time is "timeout in handshake", we have 
tested modyfing the solution of GEODE-8213 to make the client to remove the 
metadata once a timeout error is received (here is a draft with the code: 
https://github.com/apache/geode-native/pull/651). With this change in place, 
the behavior is ok.


But I would like to check your opinion about this check, because this will 
cause that a single timeout will cause the removal of the client metadata, 
which maybe its not the best solution. I thought about different alternatives:

- Wait until a given number of timeouts in a row have been received from the 
same server to remove the metadata
- Make this "remove-metadata-after-timeout" something optional that could be 
configured if needed

As this will misalign the behavior of Java and C++ clients, making this an 
optional configuration will be more appropriate, to keep the default c++ client 
behavior as the Java client.

BR/

Alberto B.


Re: [Discussion] - ClassLoaderService RFC proposal

2020-09-17 Thread Jens Deppe
Thanks for this work Udo!

Does this ClassLoaderService effectively replace the current ClassPathLoader 
mechanism then?

How will one access the service; is there some static reference? Some code 
examples would be helpful to properly understand how one might work with this.

--Jens

On 9/14/20, 3:42 AM, "Udo Kohlmeyer"  wrote:

Hi there Apache Geode Devs, (try 2)

Please find attached a proposal for a ClassLoaderService. Please review and 
ponder on it.


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FIntroduction%2Bof%2BClassLoaderService%2Binto%2BGeode&data=02%7C01%7Cjdeppe%40vmware.com%7Cde083e2172634f2bf11b08d8589ae7dc%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637356769623047129&sdata=VkXprwOM%2FzGXO%2FYMOoBMC60kiREyagn%2B6%2F%2FSvajv9Tg%3D&reserved=0

All comments are please to be made in this mail thread.

—Udo



Re: Clean C++ client metadata in timeouts

2020-09-17 Thread Dave Barnes
Alberto,
Are there cases in which one or two timeouts are followed by a successful
retry? Or does one timeout *always* end with more timeouts and, ultimately,
an IO error?
If timeouts can sometimes be followed by successful retries, and re-trying
is the current default behavior, then I agree that introducing a setting
that effectively eliminates re-tries should be the developer's choice.
In that case, I suggest that the option should not be a low-level choice of
"handle the metadata in a way that eliminates retries" but should be higher
level, like "when attempting to connect, try only once, instead of
re-trying (the default behavior)."
-Dave

On Thu, Sep 17, 2020 at 7:42 AM Alberto Bustamante Reyes
 wrote:

> Hi geode-dev,
>
> I have a question about the c++ client.
>
> Some months ago we merged GEODE-8231 to solve a problem we observed
> regarding the native client was trying to connect to stopped server.
> GEODE-8231 solution consists on remove the client metadata when an "IO
> error in handshake" exception is received. This fix solved most of our
> problems, but it has been observed that sometimes when a server is stopped
> the errors received in the client are not the same and this "IO error in
> handshake" takes up to a minute to appear. So during that time, the client
> is still trying to connect to the offline server.
>
> As the error received during that time is "timeout in handshake", we have
> tested modyfing the solution of GEODE-8213 to make the client to remove the
> metadata once a timeout error is received (here is a draft with the code:
> https://github.com/apache/geode-native/pull/651). With this change in
> place, the behavior is ok.
>
>
> But I would like to check your opinion about this check, because this will
> cause that a single timeout will cause the removal of the client metadata,
> which maybe its not the best solution. I thought about different
> alternatives:
>
> - Wait until a given number of timeouts in a row have been received from
> the same server to remove the metadata
> - Make this "remove-metadata-after-timeout" something optional that could
> be configured if needed
>
> As this will misalign the behavior of Java and C++ clients, making this an
> optional configuration will be more appropriate, to keep the default c++
> client behavior as the Java client.
>
> BR/
>
> Alberto B.
>


Re: Clean C++ client metadata in timeouts

2020-09-17 Thread Blake Bender
Given that attempts to retrieve metadata after the C++ cache is closed are a 
constant headache for Geode Native development, I am generally in favor of 
anything that potentially reduces the number of times/places this happens.  If 
we've failed the handshake, it's very unlikely things will correct themselves 
without outside intervention, so this fix is probably goodness.  I'd go ahead 
and submit a PR when you think it's solid.

Thanks,

Blake


On 9/17/20, 9:36 AM, "Dave Barnes"  wrote:

Alberto,
Are there cases in which one or two timeouts are followed by a successful
retry? Or does one timeout *always* end with more timeouts and, ultimately,
an IO error?
If timeouts can sometimes be followed by successful retries, and re-trying
is the current default behavior, then I agree that introducing a setting
that effectively eliminates re-tries should be the developer's choice.
In that case, I suggest that the option should not be a low-level choice of
"handle the metadata in a way that eliminates retries" but should be higher
level, like "when attempting to connect, try only once, instead of
re-trying (the default behavior)."
-Dave

On Thu, Sep 17, 2020 at 7:42 AM Alberto Bustamante Reyes
 wrote:

> Hi geode-dev,
>
> I have a question about the c++ client.
>
> Some months ago we merged GEODE-8231 to solve a problem we observed
> regarding the native client was trying to connect to stopped server.
> GEODE-8231 solution consists on remove the client metadata when an "IO
> error in handshake" exception is received. This fix solved most of our
> problems, but it has been observed that sometimes when a server is stopped
> the errors received in the client are not the same and this "IO error in
> handshake" takes up to a minute to appear. So during that time, the client
> is still trying to connect to the offline server.
>
> As the error received during that time is "timeout in handshake", we have
> tested modyfing the solution of GEODE-8213 to make the client to remove 
the
> metadata once a timeout error is received (here is a draft with the code:
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Fpull%2F651&data=02%7C01%7Cbblake%40vmware.com%7Cee9cfd61173047c7247808d85b27c3c8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637359573636742453&sdata=FUhQIAalNs0PK4vFvgnVZPV55cLPykD2cvDRwgRrNj0%3D&reserved=0).
 With this change in
> place, the behavior is ok.
>
>
> But I would like to check your opinion about this check, because this will
> cause that a single timeout will cause the removal of the client metadata,
> which maybe its not the best solution. I thought about different
> alternatives:
>
> - Wait until a given number of timeouts in a row have been received from
> the same server to remove the metadata
> - Make this "remove-metadata-after-timeout" something optional that could
> be configured if needed
>
> As this will misalign the behavior of Java and C++ clients, making this an
> optional configuration will be more appropriate, to keep the default c++
> client behavior as the Java client.
>
> BR/
>
> Alberto B.
>



Re: Clean C++ client metadata in timeouts

2020-09-17 Thread Dave Barnes
If a straight-up change solves a constant headache, as you suggest,
Alberto, and as Blake concurs, that sounds like the way to go.
Why introduce a new option or property if the user will always prefer one
behavior over the other? (And from a docs perspective, who needs another
optional property, anyway?)

On Thu, Sep 17, 2020 at 10:32 AM Blake Bender  wrote:

> Given that attempts to retrieve metadata after the C++ cache is closed are
> a constant headache for Geode Native development, I am generally in favor
> of anything that potentially reduces the number of times/places this
> happens.  If we've failed the handshake, it's very unlikely things will
> correct themselves without outside intervention, so this fix is probably
> goodness.  I'd go ahead and submit a PR when you think it's solid.
>
> Thanks,
>
> Blake
>
>
> On 9/17/20, 9:36 AM, "Dave Barnes"  wrote:
>
> Alberto,
> Are there cases in which one or two timeouts are followed by a
> successful
> retry? Or does one timeout *always* end with more timeouts and,
> ultimately,
> an IO error?
> If timeouts can sometimes be followed by successful retries, and
> re-trying
> is the current default behavior, then I agree that introducing a
> setting
> that effectively eliminates re-tries should be the developer's choice.
> In that case, I suggest that the option should not be a low-level
> choice of
> "handle the metadata in a way that eliminates retries" but should be
> higher
> level, like "when attempting to connect, try only once, instead of
> re-trying (the default behavior)."
> -Dave
>
> On Thu, Sep 17, 2020 at 7:42 AM Alberto Bustamante Reyes
>  wrote:
>
> > Hi geode-dev,
> >
> > I have a question about the c++ client.
> >
> > Some months ago we merged GEODE-8231 to solve a problem we observed
> > regarding the native client was trying to connect to stopped server.
> > GEODE-8231 solution consists on remove the client metadata when an
> "IO
> > error in handshake" exception is received. This fix solved most of
> our
> > problems, but it has been observed that sometimes when a server is
> stopped
> > the errors received in the client are not the same and this "IO
> error in
> > handshake" takes up to a minute to appear. So during that time, the
> client
> > is still trying to connect to the offline server.
> >
> > As the error received during that time is "timeout in handshake", we
> have
> > tested modyfing the solution of GEODE-8213 to make the client to
> remove the
> > metadata once a timeout error is received (here is a draft with the
> code:
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Fpull%2F651&data=02%7C01%7Cbblake%40vmware.com%7Cee9cfd61173047c7247808d85b27c3c8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637359573636742453&sdata=FUhQIAalNs0PK4vFvgnVZPV55cLPykD2cvDRwgRrNj0%3D&reserved=0).
> With this change in
> > place, the behavior is ok.
> >
> >
> > But I would like to check your opinion about this check, because
> this will
> > cause that a single timeout will cause the removal of the client
> metadata,
> > which maybe its not the best solution. I thought about different
> > alternatives:
> >
> > - Wait until a given number of timeouts in a row have been received
> from
> > the same server to remove the metadata
> > - Make this "remove-metadata-after-timeout" something optional that
> could
> > be configured if needed
> >
> > As this will misalign the behavior of Java and C++ clients, making
> this an
> > optional configuration will be more appropriate, to keep the default
> c++
> > client behavior as the Java client.
> >
> > BR/
> >
> > Alberto B.
> >
>
>


Re: Clean C++ client metadata in timeouts

2020-09-17 Thread Ernie Burghardt
Let's please consider how this would controlled and look for ways other than 
YetAnotherProperty

Thanks,
EB

On 9/17/20, 12:59 PM, "Dave Barnes"  wrote:

If a straight-up change solves a constant headache, as you suggest,
Alberto, and as Blake concurs, that sounds like the way to go.
Why introduce a new option or property if the user will always prefer one
behavior over the other? (And from a docs perspective, who needs another
optional property, anyway?)

On Thu, Sep 17, 2020 at 10:32 AM Blake Bender  wrote:

> Given that attempts to retrieve metadata after the C++ cache is closed are
> a constant headache for Geode Native development, I am generally in favor
> of anything that potentially reduces the number of times/places this
> happens.  If we've failed the handshake, it's very unlikely things will
> correct themselves without outside intervention, so this fix is probably
> goodness.  I'd go ahead and submit a PR when you think it's solid.
>
> Thanks,
>
> Blake
>
>
> On 9/17/20, 9:36 AM, "Dave Barnes"  wrote:
>
> Alberto,
> Are there cases in which one or two timeouts are followed by a
> successful
> retry? Or does one timeout *always* end with more timeouts and,
> ultimately,
> an IO error?
> If timeouts can sometimes be followed by successful retries, and
> re-trying
> is the current default behavior, then I agree that introducing a
> setting
> that effectively eliminates re-tries should be the developer's choice.
> In that case, I suggest that the option should not be a low-level
> choice of
> "handle the metadata in a way that eliminates retries" but should be
> higher
> level, like "when attempting to connect, try only once, instead of
> re-trying (the default behavior)."
> -Dave
>
> On Thu, Sep 17, 2020 at 7:42 AM Alberto Bustamante Reyes
>  wrote:
>
> > Hi geode-dev,
> >
> > I have a question about the c++ client.
> >
> > Some months ago we merged GEODE-8231 to solve a problem we observed
> > regarding the native client was trying to connect to stopped server.
> > GEODE-8231 solution consists on remove the client metadata when an
> "IO
> > error in handshake" exception is received. This fix solved most of
> our
> > problems, but it has been observed that sometimes when a server is
> stopped
> > the errors received in the client are not the same and this "IO
> error in
> > handshake" takes up to a minute to appear. So during that time, the
> client
> > is still trying to connect to the offline server.
> >
> > As the error received during that time is "timeout in handshake", we
> have
> > tested modyfing the solution of GEODE-8213 to make the client to
> remove the
> > metadata once a timeout error is received (here is a draft with the
> code:
> >
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Fpull%2F651&data=02%7C01%7Cburghardte%40vmware.com%7Cd73403fcd2df4b9d1d0a08d85b443413%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637359695795955165&sdata=QeXlk3XdqPn5l0jytgNYja%2Fykvm%2FFz5PySvCv8WXa2E%3D&reserved=0).
> With this change in
> > place, the behavior is ok.
> >
> >
> > But I would like to check your opinion about this check, because
> this will
> > cause that a single timeout will cause the removal of the client
> metadata,
> > which maybe its not the best solution. I thought about different
> > alternatives:
> >
> > - Wait until a given number of timeouts in a row have been received
> from
> > the same server to remove the metadata
> > - Make this "remove-metadata-after-timeout" something optional that
> could
> > be configured if needed
> >
> > As this will misalign the behavior of Java and C++ clients, making
> this an
> > optional configuration will be more appropriate, to keep the default
> c++
> > client behavior as the Java client.
> >
> > BR/
> >
> > Alberto B.
> >
>
>



Re: [Discussion] - ClassLoaderService RFC proposal

2020-09-17 Thread Udo Kohlmeyer
Jens, yes and no.

The ClassPathLoader will still be relevant for the ClassLoader mechanisms we 
require for the non-ClassLoader-Isolated starting of the server.

But all the functionality that ClassPathLoader exposes will eventually be 
replaced with the ClassLoader Isolated capability.

I’ll add some code snippets to the RFC to further explain the proposal. Thank 
you for raising this.

—Udo
On Sep 18, 2020, 2:22 AM +1000, Jens Deppe , wrote:
Thanks for this work Udo!

Does this ClassLoaderService effectively replace the current ClassPathLoader 
mechanism then?

How will one access the service; is there some static reference? Some code 
examples would be helpful to properly understand how one might work with this.

--Jens

On 9/14/20, 3:42 AM, "Udo Kohlmeyer"  wrote:

Hi there Apache Geode Devs, (try 2)

Please find attached a proposal for a ClassLoaderService. Please review and 
ponder on it.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FIntroduction%2Bof%2BClassLoaderService%2Binto%2BGeode&data=02%7C01%7Cudo%40vmware.com%7C6390afa1ddb342a93e1808d85b25d703%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637359565380628408&sdata=TgEkaSRzde3EkhejssRwhRF8IbiNY0D4J4V3XaY3V2E%3D&reserved=0

All comments are please to be made in this mail thread.

—Udo



Re: [Discussion] - ClassLoaderService RFC proposal

2020-09-17 Thread Udo Kohlmeyer
Sorry...

Also, the new implementation will not be statically available and for now the 
instance is passed into the locations that require it.

—Udo
On Sep 18, 2020, 2:22 AM +1000, Jens Deppe , wrote:
Thanks for this work Udo!

Does this ClassLoaderService effectively replace the current ClassPathLoader 
mechanism then?

How will one access the service; is there some static reference? Some code 
examples would be helpful to properly understand how one might work with this.

--Jens

On 9/14/20, 3:42 AM, "Udo Kohlmeyer"  wrote:

Hi there Apache Geode Devs, (try 2)

Please find attached a proposal for a ClassLoaderService. Please review and 
ponder on it.

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FIntroduction%2Bof%2BClassLoaderService%2Binto%2BGeode&data=02%7C01%7Cudo%40vmware.com%7C6390afa1ddb342a93e1808d85b25d703%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637359565380628408&sdata=TgEkaSRzde3EkhejssRwhRF8IbiNY0D4J4V3XaY3V2E%3D&reserved=0

All comments are please to be made in this mail thread.

—Udo