Alberto, give us a mention on the draft PR and we can think about how it might 
be tested too... might get more ideas from the group...

Thanks,
EB

On 9/18/20, 3:51 AM, "Alberto Bustamante Reyes" 
<alberto.bustamante.re...@est.tech> wrote:

    Hi,

    Thanks for you messages, here you are some answers:

    Dave:
    Are there cases in which one or two timeouts are followed by a successful
    retry? Or does one timeout *always* end with more timeouts and, ultimately,
    an IO error?
    Not in our use case, which is kiling a server. In this case, timeouts will 
end up on an IO error.

    If a straight-up change solves a constant headache, as you suggest, 
Alberto, and as Blake concurs, that sounds like the way to go. Why introduce a 
new option or property if the user will always prefer one behavior over the 
other?

    The fix works fine for our use case, I suggested the alternatives to make 
it something optional in case there were concerns about it. In other projects I 
have been involved in the past, we had to deal with temporary network problems. 
So most of the times, if a timeout had a consequence (so to say), that was not 
applied after just one timeout.

    But its true that in this use case, a timeout always ends up on an IO 
error, as I said. So if you dont see any problem with cleaning the metadata 
just after one timeout, then we dont need any control mechanism for it.



    Blake:
    Given that attempts to retrieve metadata after the C++ cache is closed are 
a constant headache for Geode Native development, I am generally in favor of 
anything that potentially reduces the number of times/places this happens.  If 
we've failed the handshake, it's very unlikely things will correct themselves 
without outside intervention, so this fix is probably goodness. I'd go ahead 
and submit a PR when you think it's solid.

    Good to hear that. The code changes in the draft PR are ready, I just need 
to figure out the testing part. Im not sure how I will add a test because it 
would be the same test as the one added for GEODE-8231...


    BR/

    Alberto B.


    ________________________________
    De: Ernie Burghardt <burghar...@vmware.com>
    Enviado: jueves, 17 de septiembre de 2020 22:08
    Para: dev@geode.apache.org <dev@geode.apache.org>
    Asunto: Re: Clean C++ client metadata in timeouts

    Let's please consider how this would controlled and look for ways other 
than YetAnotherProperty

    Thanks,
    EB

    On 9/17/20, 12:59 PM, "Dave Barnes" <dbar...@apache.org> wrote:

        If a straight-up change solves a constant headache, as you suggest,
        Alberto, and as Blake concurs, that sounds like the way to go.
        Why introduce a new option or property if the user will always prefer 
one
        behavior over the other? (And from a docs perspective, who needs another
        optional property, anyway?)

        On Thu, Sep 17, 2020 at 10:32 AM Blake Bender <bbl...@vmware.com> wrote:

        > Given that attempts to retrieve metadata after the C++ cache is 
closed are
        > a constant headache for Geode Native development, I am generally in 
favor
        > of anything that potentially reduces the number of times/places this
        > happens.  If we've failed the handshake, it's very unlikely things 
will
        > correct themselves without outside intervention, so this fix is 
probably
        > goodness.  I'd go ahead and submit a PR when you think it's solid.
        >
        > Thanks,
        >
        > Blake
        >
        >
        > On 9/17/20, 9:36 AM, "Dave Barnes" <dbar...@apache.org> wrote:
        >
        >     Alberto,
        >     Are there cases in which one or two timeouts are followed by a
        > successful
        >     retry? Or does one timeout *always* end with more timeouts and,
        > ultimately,
        >     an IO error?
        >     If timeouts can sometimes be followed by successful retries, and
        > re-trying
        >     is the current default behavior, then I agree that introducing a
        > setting
        >     that effectively eliminates re-tries should be the developer's 
choice.
        >     In that case, I suggest that the option should not be a low-level
        > choice of
        >     "handle the metadata in a way that eliminates retries" but should 
be
        > higher
        >     level, like "when attempting to connect, try only once, instead of
        >     re-trying (the default behavior)."
        >     -Dave
        >
        >     On Thu, Sep 17, 2020 at 7:42 AM Alberto Bustamante Reyes
        >     <alberto.bustamante.re...@est.tech> wrote:
        >
        >     > Hi geode-dev,
        >     >
        >     > I have a question about the c++ client.
        >     >
        >     > Some months ago we merged GEODE-8231 to solve a problem we 
observed
        >     > regarding the native client was trying to connect to stopped 
server.
        >     > GEODE-8231 solution consists on remove the client metadata when 
an
        > "IO
        >     > error in handshake" exception is received. This fix solved most 
of
        > our
        >     > problems, but it has been observed that sometimes when a server 
is
        > stopped
        >     > the errors received in the client are not the same and this "IO
        > error in
        >     > handshake" takes up to a minute to appear. So during that time, 
the
        > client
        >     > is still trying to connect to the offline server.
        >     >
        >     > As the error received during that time is "timeout in 
handshake", we
        > have
        >     > tested modyfing the solution of GEODE-8213 to make the client to
        > remove the
        >     > metadata once a timeout error is received (here is a draft with 
the
        > code:
        >     >
        > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Fpull%2F651&amp;data=02%7C01%7Cburghardte%40vmware.com%7C81149957da084848bb4308d85bc0bd0f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637360230711375298&amp;sdata=3%2FYLIk8yPJdhYTJ2mrw65JHxQYxNyVy4sdQkAAvMlF8%3D&amp;reserved=0).
        > With this change in
        >     > place, the behavior is ok.
        >     >
        >     >
        >     > But I would like to check your opinion about this check, because
        > this will
        >     > cause that a single timeout will cause the removal of the client
        > metadata,
        >     > which maybe its not the best solution. I thought about different
        >     > alternatives:
        >     >
        >     > - Wait until a given number of timeouts in a row have been 
received
        > from
        >     > the same server to remove the metadata
        >     > - Make this "remove-metadata-after-timeout" something optional 
that
        > could
        >     > be configured if needed
        >     >
        >     > As this will misalign the behavior of Java and C++ clients, 
making
        > this an
        >     > optional configuration will be more appropriate, to keep the 
default
        > c++
        >     > client behavior as the Java client.
        >     >
        >     > BR/
        >     >
        >     > Alberto B.
        >     >
        >
        >


Reply via email to