Hi everyone, Regarding what @Jacob Barrett<mailto:jabarr...@vmware.com> mentioned about the geode-native timeout handling, yes, I am aware of that, we are working on identifying any problems to open PRs. But my feeling is that this PR https://github.com/apache/geode-native/pull/695 will dramatically improve things there 🙂
Regarding parametrization, we've been testing several parametrizations and things look really promising, there are some minor things to tweak, but we barely notice the impact. Regarding allowing a to configure Geode as a non primary/secondary (a.k.a multi-master) distributed system, thin is I've been reading, just out of curiosity, and it turns out, it is feasible. I.E: Google Spanner<https://static.googleusercontent.com/media/research.google.com/es//pubs/archive/45855.pdf> Different thing is that it is something that can be easily implemented in Geode, or even maybe that's not something we've want. Still, I think that's a conversation for another forum. So really thanks everyone that helped 🙂 BR, Mario. ________________________________ From: Anthony Baker <bak...@vmware.com> Sent: Monday, November 23, 2020 6:25 PM To: dev@geode.apache.org <dev@geode.apache.org> Cc: miguel.g.gar...@ericsson.com <miguel.g.gar...@ericsson.com> Subject: Re: Requests taking too long if one member of the cluster fails Yes, lowering the member timeout is one approach I’ve seen taken for applications that demand ultra low latency. These workloads need to provide not just low “average” or even p99 latency, but put a hard limit on the max value. When you do this you need to ensure coherency across at all aspects of timeouts (eg client read timeouts and retries). You need to ensure that GC pauses don’t cause instability in the cluster. For example, if a GC pause is greater than the member timeout, you should go back and re-tune your heap settings to drive down GC. If you are running in a container of VM you need to ensure sufficient resources so that the GemFIre process is never paused. All this presupposes a stable and performant network infrastructure. Anthony On Nov 21, 2020, at 1:40 PM, Mario Salazar de Torres <mario.salazar.de.tor...@est.tech<mailto:mario.salazar.de.tor...@est.tech>> wrote: So, what I've tried here is to set a really low member-timeout, which results the server holding the secondary copy becoming the primary owner in around <600ms. That's quite a huge improvement, but I wanted to ask you if setting this member-timeout too low might carry unforeseen consequences.