Alberto, Can you please file a JIRA ticket for this. This could come up often as more and more deployments move to K8s.
-Anil. On Fri, Dec 6, 2019 at 8:33 AM Sai Boorlagadda <sai.boorlaga...@gmail.com> wrote: > > if one gw receiver stops, the locator will publish to any remote locator > that there are no receivers up. > > I am not sure if locators proactively update remote locators about change > in receivers list rather I think the senders figures this out on connection > issues. > But I see the problem that local-site locators have only one member in the > list of receivers that they maintain as all receivers register with a > single <hostname:port> address. > > One idea I had earlier is to statically set receivers list to locators > (just like remote-locators property) which are exchanged with gw-senders. > This way we can introduce a boolean flag to turn off wan discovery and use > the statically configured addresses. This can be also useful for > remote-locators if they are behind a service. > > Sai > > On Thu, Dec 5, 2019 at 2:33 AM Alberto Bustamante Reyes > <alberto.bustamante.re...@est.tech> wrote: > > > Thanks Charlie, but the issue is not about connectivity. Summarizing the > > issue, the problem is that if you have two or more gw receivers that are > > started with the same value of "hostname-for-senders", "start-port" and > > "end-port" (being "start-port" and "end-port" equal) parameters, if one > gw > > receiver stops, the locator will publish to any remote locator that there > > are no receivers up. > > > > And this use case is likely to happen on cloud-native environments, as > > described. > > > > BR/ > > > > Alberto B. > > ________________________________ > > De: Charlie Black <cbl...@pivotal.io> > > Enviado: miércoles, 4 de diciembre de 2019 18:11 > > Para: dev@geode.apache.org <dev@geode.apache.org> > > Asunto: Re: WAN replication issue in cloud native environments > > > > Alberto, > > > > Something else to think about SNI based routing. I believe Mario might > be > > working on adding SNI to Geode - he at least had a proposal that he > > e-mailed out. > > > > Basics are the destination host is in the SNI field and the proxy can > > inspect and route the request to the right service instance. Plus we > > have the option to not terminate the SSL at the proxy. > > > > Full disclosure - I haven't tried out SNI based routing myself and it is > > something that I thought could work as I was reading about it. From the > > whiteboard I have done I think this will do ingress and egress just fine. > > Potentially easier then port mapping and `hostname for clients` playing > > around. > > > > Just something to think about. > > > > Charlie > > > > > > On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes > > <alberto.bustamante.re...@est.tech> wrote: > > > > > Hi Jacob, > > > > > > Yes,we are using LoadBalancer service type. But note the problem is not > > > the transport layer but on Geode as GW senders are complaining > > > “sender-2-parallel : Could not connect due to: There are no active > > > servers.” when one of the servers in the receiving cluster is killed. > > > > > > So, there is still one server alive in the receiving cluster but GW > > sender > > > does not know it and the locator is not able to inform about its > > existence. > > > Looking at the code it seems internal data structures (maps) holding > the > > > profiles use object whose equality check relies only on hostname and > > port. > > > This makes it impossible to differentiate servers when the same > > > “hostname-for-senders” and port are used. When the killed server comes > > back > > > up, the locator profiles are updated (internal map back to size()=1 > > > although 2+ servers are there) and GW senders happily reconnect. > > > > > > The solution with the Geode as-is would be to expose each GW receiver > on > > a > > > different port outside of k8s cluster, this includes creating N > > Kubernetes > > > services for N GW receivers in addition to updating the service mesh > > > configuration (if it is used, firewalls etc…). Declarative nature of > > > kubernetes means we must know the ports in advance hence start-port and > > > end-port when creating each GW receiver must be equal and we should > have > > > some well-known > > > algorithm when creating GW receivers across servers. For example: > > server-0 > > > port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW > > > receivers must be wired individually and we must turn off Geode’s > random > > > port allocation. > > > > > > But we are exploring the possibility for Geode to handle this > > cloud-native > > > configuration a bit better. Locators should be capable of holding GW > > > receiver information although they are hidden behind same hostname and > > port. > > > This is a code change in Geode and we would like to have community > > opinion > > > on it. > > > > > > Some obvious impacts with the legacy behavior would be when locator > picks > > > a server on behalf of the client (GW sender in this case) it does so > > based > > > on the server load. When sender connects and considering all servers > are > > > using same VIP:PORT it is load balancer that will decide where the > > > connection will end up, but likely not on the one selected by locator. > So > > > here we ignore the locator instructions. Since GW senders normally do > not > > > create huge number of connections this probably shall not unbalance > > cluster > > > too much. But this is an impact worth considering. Custom load metrics > > > would also be ignored by GW senders. Opinions? > > > > > > Additional impact that comes to mind is GW sender load-balance command > > and > > > how it’s execution would be affected. > > > > > > Thanks! > > > > > > Alberto B. > > > > > > ________________________________ > > > De: Jacob Barrett <jbarr...@pivotal.io> > > > Enviado: viernes, 29 de noviembre de 2019 13:06 > > > Para: dev@geode.apache.org <dev@geode.apache.org> > > > Asunto: Re: WAN replication issue in cloud native environments > > > > > > > > > > > > > On Nov 29, 2019, at 3:14 AM, Alberto Bustamante Reyes > > > <alberto.bustamante.re...@est.tech> wrote: > > > > > > > > The reason for such a setup is deploying Geode cluster on a > Kubernetes > > > cluster where all GW receivers are reachable from the outside world on > > the > > > same VIP and port. > > > > > > Are you using LoadBalancer Service type? > > > > > > > Other kinds of configuration (different hostname and/or different > port > > > for each GW receiver) are not cheap from OAM and resources perspective > in > > > cloud native environments and also limit some important use-cases (like > > > scaling). > > > > > > If you could somehow configure host and port for sender (code > > modification > > > required) would exposing each port through the LoadBalancer be too > > > expensive too? > > > > > > > The problem experienced is that shutting down one server is stopping > > > replication to this cluster until the server is up again. We suspect > this > > > is because Geode incorrectly assumes there are no more alive servers > when > > > just one of them is down (since they share hostname-for-senders and > > port). > > > > > > Sees like at the worst case when it tries to reconnect the LB should > give > > > it a live server and it think the single server is back up. > > > > > > -Jake > > > > > > > > > > -- > > Charlie Black | cbl...@pivotal.io > > >