Alberto,

Can you please file a JIRA ticket for this. This could come up often as
more and more deployments move to K8s.

-Anil.


On Fri, Dec 6, 2019 at 8:33 AM Sai Boorlagadda <sai.boorlaga...@gmail.com>
wrote:

> > if one gw receiver stops, the locator will publish to any remote locator
> that there are no receivers up.
>
> I am not sure if locators proactively update remote locators about change
> in receivers list rather I think the senders figures this out on connection
> issues.
> But I see the problem that local-site locators have only one member in the
> list of receivers that they maintain as all receivers register with a
> single <hostname:port> address.
>
> One idea I had earlier is to statically set receivers list to locators
> (just like remote-locators property) which are exchanged with gw-senders.
> This way we can introduce a boolean flag to turn off wan discovery and use
> the statically configured addresses. This can be also useful for
> remote-locators if they are behind a service.
>
> Sai
>
> On Thu, Dec 5, 2019 at 2:33 AM Alberto Bustamante Reyes
> <alberto.bustamante.re...@est.tech> wrote:
>
> > Thanks Charlie, but the issue is not about connectivity. Summarizing the
> > issue, the problem is that if you have two or more gw receivers that are
> > started with the same value of "hostname-for-senders", "start-port" and
> > "end-port" (being "start-port" and "end-port" equal) parameters, if one
> gw
> > receiver stops, the locator will publish to any remote locator that there
> > are no receivers up.
> >
> > And this use case is likely to happen on cloud-native environments, as
> > described.
> >
> > BR/
> >
> > Alberto B.
> > ________________________________
> > De: Charlie Black <cbl...@pivotal.io>
> > Enviado: miércoles, 4 de diciembre de 2019 18:11
> > Para: dev@geode.apache.org <dev@geode.apache.org>
> > Asunto: Re: WAN replication issue in cloud native environments
> >
> > Alberto,
> >
> > Something else to think about SNI based routing.   I believe Mario might
> be
> > working on adding SNI to Geode - he at least had a proposal that he
> > e-mailed out.
> >
> > Basics are the destination host is in the SNI field and the proxy can
> > inspect and route the request to the right service instance.     Plus we
> > have the option to not terminate the SSL at the proxy.
> >
> > Full disclosure - I haven't tried out SNI based routing myself and it is
> > something that I thought could work as I was reading about it.   From the
> > whiteboard I have done I think this will do ingress and egress just fine.
> > Potentially easier then port mapping and `hostname for clients` playing
> > around.
> >
> > Just something to think about.
> >
> > Charlie
> >
> >
> > On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes
> > <alberto.bustamante.re...@est.tech> wrote:
> >
> > > Hi Jacob,
> > >
> > > Yes,we are using LoadBalancer service type. But note the problem is not
> > > the transport layer but on Geode as GW senders are complaining
> > > “sender-2-parallel : Could not connect due to: There are no active
> > > servers.” when one of the servers in the receiving cluster is killed.
> > >
> > > So, there is still one server alive in the receiving cluster but GW
> > sender
> > > does not know it and the locator is not able to inform about its
> > existence.
> > > Looking at the code it seems internal data structures (maps) holding
> the
> > > profiles use object whose equality check relies only on hostname and
> > port.
> > > This makes it impossible to differentiate servers when the same
> > > “hostname-for-senders” and port are used. When the killed server comes
> > back
> > > up, the locator profiles are updated (internal map back to size()=1
> > > although 2+ servers are there) and GW senders happily reconnect.
> > >
> > > The solution with the Geode as-is would be to expose each GW receiver
> on
> > a
> > > different port outside of k8s cluster, this includes creating N
> > Kubernetes
> > > services for N GW receivers in addition to updating the service mesh
> > > configuration (if it is used, firewalls etc…). Declarative nature of
> > > kubernetes means we must know the ports in advance hence start-port and
> > > end-port when creating each GW receiver must be equal and we should
> have
> > > some well-known
> > > algorithm when creating GW receivers across servers. For example:
> > server-0
> > > port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW
> > > receivers must be wired individually and we must turn off Geode’s
> random
> > > port allocation.
> > >
> > > But we are exploring the possibility for Geode to handle this
> > cloud-native
> > > configuration a bit better. Locators should be capable of holding GW
> > > receiver information although they are hidden behind same hostname and
> > port.
> > > This is a code change in Geode and we would like to have community
> > opinion
> > > on it.
> > >
> > > Some obvious impacts with the legacy behavior would be when locator
> picks
> > > a server on behalf of the client (GW sender in this case) it does so
> > based
> > >  on the server load. When sender connects and considering all servers
> are
> > > using same VIP:PORT it is load balancer that will decide where the
> > > connection will end up, but likely not on the one selected by locator.
> So
> > > here we ignore the locator instructions. Since GW senders normally do
> not
> > > create huge number of connections this probably shall not unbalance
> > cluster
> > > too much. But this is an impact worth considering. Custom load metrics
> > > would also be ignored by GW senders. Opinions?
> > >
> > > Additional impact that comes to mind is GW sender load-balance command
> > and
> > > how it’s execution would be affected.
> > >
> > > Thanks!
> > >
> > > Alberto B.
> > >
> > > ________________________________
> > > De: Jacob Barrett <jbarr...@pivotal.io>
> > > Enviado: viernes, 29 de noviembre de 2019 13:06
> > > Para: dev@geode.apache.org <dev@geode.apache.org>
> > > Asunto: Re: WAN replication issue in cloud native environments
> > >
> > >
> > >
> > > > On Nov 29, 2019, at 3:14 AM, Alberto Bustamante Reyes
> > > <alberto.bustamante.re...@est.tech> wrote:
> > > >
> > > > The reason for such a setup is deploying Geode cluster on a
> Kubernetes
> > > cluster where all GW receivers are reachable from the outside world on
> > the
> > > same VIP and port.
> > >
> > > Are you using LoadBalancer Service type?
> > >
> > > > Other kinds of configuration (different hostname and/or different
> port
> > > for each GW receiver) are not cheap from OAM and resources perspective
> in
> > > cloud native environments and also limit some important use-cases (like
> > > scaling).
> > >
> > > If you could somehow configure host and port for sender (code
> > modification
> > > required) would exposing each port through the LoadBalancer be too
> > > expensive too?
> > >
> > > > The problem experienced is that shutting down one server is stopping
> > > replication to this cluster until the server is up again. We suspect
> this
> > > is because Geode incorrectly assumes there are no more alive servers
> when
> > > just one of them is down (since they share hostname-for-senders and
> > port).
> > >
> > > Sees like at the worst case when it tries to reconnect the LB should
> give
> > > it a live server and it think the single server is back up.
> > >
> > > -Jake
> > >
> > >
> >
> > --
> > Charlie Black | cbl...@pivotal.io
> >
>

Reply via email to