Given that attempts to retrieve metadata after the C++ cache is closed are a 
constant headache for Geode Native development, I am generally in favor of 
anything that potentially reduces the number of times/places this happens.  If 
we've failed the handshake, it's very unlikely things will correct themselves 
without outside intervention, so this fix is probably goodness.  I'd go ahead 
and submit a PR when you think it's solid.

Thanks,

Blake


On 9/17/20, 9:36 AM, "Dave Barnes" <dbar...@apache.org> wrote:

    Alberto,
    Are there cases in which one or two timeouts are followed by a successful
    retry? Or does one timeout *always* end with more timeouts and, ultimately,
    an IO error?
    If timeouts can sometimes be followed by successful retries, and re-trying
    is the current default behavior, then I agree that introducing a setting
    that effectively eliminates re-tries should be the developer's choice.
    In that case, I suggest that the option should not be a low-level choice of
    "handle the metadata in a way that eliminates retries" but should be higher
    level, like "when attempting to connect, try only once, instead of
    re-trying (the default behavior)."
    -Dave

    On Thu, Sep 17, 2020 at 7:42 AM Alberto Bustamante Reyes
    <alberto.bustamante.re...@est.tech> wrote:

    > Hi geode-dev,
    >
    > I have a question about the c++ client.
    >
    > Some months ago we merged GEODE-8231 to solve a problem we observed
    > regarding the native client was trying to connect to stopped server.
    > GEODE-8231 solution consists on remove the client metadata when an "IO
    > error in handshake" exception is received. This fix solved most of our
    > problems, but it has been observed that sometimes when a server is stopped
    > the errors received in the client are not the same and this "IO error in
    > handshake" takes up to a minute to appear. So during that time, the client
    > is still trying to connect to the offline server.
    >
    > As the error received during that time is "timeout in handshake", we have
    > tested modyfing the solution of GEODE-8213 to make the client to remove 
the
    > metadata once a timeout error is received (here is a draft with the code:
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Fpull%2F651&amp;data=02%7C01%7Cbblake%40vmware.com%7Cee9cfd61173047c7247808d85b27c3c8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637359573636742453&amp;sdata=FUhQIAalNs0PK4vFvgnVZPV55cLPykD2cvDRwgRrNj0%3D&amp;reserved=0).
 With this change in
    > place, the behavior is ok.
    >
    >
    > But I would like to check your opinion about this check, because this will
    > cause that a single timeout will cause the removal of the client metadata,
    > which maybe its not the best solution. I thought about different
    > alternatives:
    >
    > - Wait until a given number of timeouts in a row have been received from
    > the same server to remove the metadata
    > - Make this "remove-metadata-after-timeout" something optional that could
    > be configured if needed
    >
    > As this will misalign the behavior of Java and C++ clients, making this an
    > optional configuration will be more appropriate, to keep the default c++
    > client behavior as the Java client.
    >
    > BR/
    >
    > Alberto B.
    >

Reply via email to