Alberto, give us a mention on the draft PR and we can think about how it might be tested too... might get more ideas from the group...
Thanks, EB On 9/18/20, 3:51 AM, "Alberto Bustamante Reyes" <alberto.bustamante.re...@est.tech> wrote: Hi, Thanks for you messages, here you are some answers: Dave: Are there cases in which one or two timeouts are followed by a successful retry? Or does one timeout *always* end with more timeouts and, ultimately, an IO error? Not in our use case, which is kiling a server. In this case, timeouts will end up on an IO error. If a straight-up change solves a constant headache, as you suggest, Alberto, and as Blake concurs, that sounds like the way to go. Why introduce a new option or property if the user will always prefer one behavior over the other? The fix works fine for our use case, I suggested the alternatives to make it something optional in case there were concerns about it. In other projects I have been involved in the past, we had to deal with temporary network problems. So most of the times, if a timeout had a consequence (so to say), that was not applied after just one timeout. But its true that in this use case, a timeout always ends up on an IO error, as I said. So if you dont see any problem with cleaning the metadata just after one timeout, then we dont need any control mechanism for it. Blake: Given that attempts to retrieve metadata after the C++ cache is closed are a constant headache for Geode Native development, I am generally in favor of anything that potentially reduces the number of times/places this happens. If we've failed the handshake, it's very unlikely things will correct themselves without outside intervention, so this fix is probably goodness. I'd go ahead and submit a PR when you think it's solid. Good to hear that. The code changes in the draft PR are ready, I just need to figure out the testing part. Im not sure how I will add a test because it would be the same test as the one added for GEODE-8231... BR/ Alberto B. ________________________________ De: Ernie Burghardt <burghar...@vmware.com> Enviado: jueves, 17 de septiembre de 2020 22:08 Para: dev@geode.apache.org <dev@geode.apache.org> Asunto: Re: Clean C++ client metadata in timeouts Let's please consider how this would controlled and look for ways other than YetAnotherProperty Thanks, EB On 9/17/20, 12:59 PM, "Dave Barnes" <dbar...@apache.org> wrote: If a straight-up change solves a constant headache, as you suggest, Alberto, and as Blake concurs, that sounds like the way to go. Why introduce a new option or property if the user will always prefer one behavior over the other? (And from a docs perspective, who needs another optional property, anyway?) On Thu, Sep 17, 2020 at 10:32 AM Blake Bender <bbl...@vmware.com> wrote: > Given that attempts to retrieve metadata after the C++ cache is closed are > a constant headache for Geode Native development, I am generally in favor > of anything that potentially reduces the number of times/places this > happens. If we've failed the handshake, it's very unlikely things will > correct themselves without outside intervention, so this fix is probably > goodness. I'd go ahead and submit a PR when you think it's solid. > > Thanks, > > Blake > > > On 9/17/20, 9:36 AM, "Dave Barnes" <dbar...@apache.org> wrote: > > Alberto, > Are there cases in which one or two timeouts are followed by a > successful > retry? Or does one timeout *always* end with more timeouts and, > ultimately, > an IO error? > If timeouts can sometimes be followed by successful retries, and > re-trying > is the current default behavior, then I agree that introducing a > setting > that effectively eliminates re-tries should be the developer's choice. > In that case, I suggest that the option should not be a low-level > choice of > "handle the metadata in a way that eliminates retries" but should be > higher > level, like "when attempting to connect, try only once, instead of > re-trying (the default behavior)." > -Dave > > On Thu, Sep 17, 2020 at 7:42 AM Alberto Bustamante Reyes > <alberto.bustamante.re...@est.tech> wrote: > > > Hi geode-dev, > > > > I have a question about the c++ client. > > > > Some months ago we merged GEODE-8231 to solve a problem we observed > > regarding the native client was trying to connect to stopped server. > > GEODE-8231 solution consists on remove the client metadata when an > "IO > > error in handshake" exception is received. This fix solved most of > our > > problems, but it has been observed that sometimes when a server is > stopped > > the errors received in the client are not the same and this "IO > error in > > handshake" takes up to a minute to appear. So during that time, the > client > > is still trying to connect to the offline server. > > > > As the error received during that time is "timeout in handshake", we > have > > tested modyfing the solution of GEODE-8213 to make the client to > remove the > > metadata once a timeout error is received (here is a draft with the > code: > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Fpull%2F651&data=02%7C01%7Cburghardte%40vmware.com%7C81149957da084848bb4308d85bc0bd0f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637360230711375298&sdata=3%2FYLIk8yPJdhYTJ2mrw65JHxQYxNyVy4sdQkAAvMlF8%3D&reserved=0). > With this change in > > place, the behavior is ok. > > > > > > But I would like to check your opinion about this check, because > this will > > cause that a single timeout will cause the removal of the client > metadata, > > which maybe its not the best solution. I thought about different > > alternatives: > > > > - Wait until a given number of timeouts in a row have been received > from > > the same server to remove the metadata > > - Make this "remove-metadata-after-timeout" something optional that > could > > be configured if needed > > > > As this will misalign the behavior of Java and C++ clients, making > this an > > optional configuration will be more appropriate, to keep the default > c++ > > client behavior as the Java client. > > > > BR/ > > > > Alberto B. > > > >