Ok, I have moved the RFC then. Thanks again for your time & help!
________________________________
De: Dan Smith <dsm...@pivotal.io>
Enviado: jueves, 26 de marzo de 2020 18:54
Para: dev@geode.apache.org <dev@geode.apache.org>
Asunto: Re: WAN replication issue in cloud native environments

+1

After talking through this with Bruce a bit, I think the changes you are
proposing to LocatorLoadSnapshot and EndPointManager manager make sense.
For the ping issue, I like the proposed solution to forward the ping to the
correct server. Sounds good!

-Dan

On Thu, Mar 26, 2020 at 10:47 AM Bruce Schuchardt <bschucha...@pivotal.io>
wrote:

> +1
>
> I think this could move to the "In Development" state
>
>
>
> From: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
> Date: Wednesday, March 25, 2020 at 4:13 PM
> To: Bruce Schuchardt <bschucha...@pivotal.io>, Dan Smith <
> dsm...@pivotal.io>, "dev@geode.apache.org" <dev@geode.apache.org>
> Cc: Jacob Barrett <jbarr...@pivotal.io>, Anilkumar Gingade <
> aging...@pivotal.io>, Charlie Black <cbl...@pivotal.io>
> Subject: RE: WAN replication issue in cloud native environments
>
>
>
> Hi,
>
>
>
> I have modified the RFC to include the alternative suggested by Bruce. Im
> also extending the deadline for sending comments to next Friday 27th March
> EOB.
>
>
>
> Thanks!
>
>
>
> BR/
>
>
>
> Alberto B.
>
> De: Bruce Schuchardt <bschucha...@pivotal.io>
> Enviado: lunes, 23 de marzo de 2020 22:38
> Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>; Dan
> Smith <dsm...@pivotal.io>; dev@geode.apache.org <dev@geode.apache.org>
> Cc: Jacob Barrett <jbarr...@pivotal.io>; Anilkumar Gingade <
> aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io>
> Asunto: Re: WAN replication issue in cloud native environments
>
>
>
> I think what Dan did was pass in a socket factory that would connect to
> his gateway instead of the requested server.  Doing it like that would
> require a lot less code change than what you’re currently doing and would
> get past the unit test problem.
>
>
>
> I can point you to where you’d need to make changes for the Ping
> operatio:.  PingOpImpl would need to send the ServerLocation it’s trying to
> reach.  PingOp.execute() gets that as a parameter and
> PingOpImpl.sendMessage() writes it to the server.  The Ping command class’s
> cmdExecute would need to read that data if
> serverConnection.getClientVersion() is Version.GEODE_1_13_0 or later.  Then
> it would have to compare the server location it read to that server’s
> coordinates and, if not equal, find the server with those coordinates and
> send a new DistributionMessage to it with the client’s identity.  There are
> plenty of DistributionMessage classes around to look at as precedents.  You
> send the message with
> serverConnection.getCache().getDistributionManager().putOutgoing(message).
>
>
>
> You can PM me any time.  Dan could answer questions about his gateway work.
>
>
>
>
>
> From: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
> Date: Monday, March 23, 2020 at 2:18 PM
> To: Bruce Schuchardt <bschucha...@pivotal.io>, Dan Smith <
> dsm...@pivotal.io>, "dev@geode.apache.org" <dev@geode.apache.org>
> Cc: Jacob Barrett <jbarr...@pivotal.io>, Anilkumar Gingade <
> aging...@pivotal.io>, Charlie Black <cbl...@pivotal.io>
> Subject: RE: WAN replication issue in cloud native environments
>
>
>
> Thanks for your answer and your comment in the wiki Bruce. I will take a
> closer look at what you mentioned, it is not clear enough for me how to
> implement it.
>
>
>
> BTW, I forgot to set a deadline for the wiki review, I hope that Thursday
> 26th March is enough to receive comments.
>
> De: Bruce Schuchardt <bschucha...@pivotal.io>
> Enviado: jueves, 19 de marzo de 2020 16:30
> Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>; Dan
> Smith <dsm...@pivotal.io>; dev@geode.apache.org <dev@geode.apache.org>
> Cc: Jacob Barrett <jbarr...@pivotal.io>; Anilkumar Gingade <
> aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io>
> Asunto: Re: WAN replication issue in cloud native environments
>
>
>
> I wonder if an approach similar to the SNI hostname PoolFactory changes
> would work for this non-TLS gateway.  The client needs to differentiate
> between the different servers so that it doesn’t declare all of them dead
> should one of them fail.  If the pool knew about the gateway it could
> direct all traffic there and the servers wouldn’t need to set a
> hostname-for-clients.
>
>
>
> It’s not an ideal solution since the gateway wouldn’t know which server
> the client wanted to contact and there are sure to be other problems like
> creating a backup queue for subscriptions.  But that’s the case with the
> hostname-for-clients approach, too.
>
>
>
>
>
> From: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
> Date: Wednesday, March 18, 2020 at 8:35 AM
> To: Dan Smith <dsm...@pivotal.io>, "dev@geode.apache.org" <
> dev@geode.apache.org>
> Cc: Bruce Schuchardt <bschucha...@pivotal.io>, Jacob Barrett <
> jbarr...@pivotal.io>, Anilkumar Gingade <aging...@pivotal.io>, Charlie
> Black <cbl...@pivotal.io>
> Subject: RE: WAN replication issue in cloud native environments
>
>
>
> Hi all,
>
>
>
> As Bruce suggested me, I have created a wiki page describing the problem
> we are trying to solve:
> https://cwiki.apache.org/confluence/display/GEODE/Allow+same+host+and+port+for+all+gateway+receivers
>
>
>
> Please let me know if further clarifications are needed.
>
>
>
> Also, I have closed the PR I have been using until now, and created a new
> one with the current status of the solution, with one commit per issue
> described in the wiki: https://github.com/apache/geode/pull/4824
>
>
>
> Thanks in advance!
>
> De: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
> Enviado: lunes, 9 de marzo de 2020 11:24
> Para: Dan Smith <dsm...@pivotal.io>
> Cc: dev@geode.apache.org <dev@geode.apache.org>; Bruce Schuchardt <
> bschucha...@pivotal.io>; Jacob Barrett <jbarr...@pivotal.io>; Anilkumar
> Gingade <aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io>
> Asunto: RE: WAN replication issue in cloud native environments
>
>
>
> Thanks for point that out Dan. Sorry for the misunderstanding, as I only
> found that "affinity" (setServerAffinityLocation method) on the client code
> I thought you were talking about it.
> Anyway, I did some more tests and it does not solve our problem...
>
> I tried configuring the service affinity on k8s, but it breaks the first
> part of the solution (the changes implemented on LocatorLoadSnapshot that
> solves the problem of the replication) and senders do not connect to other
> receivers when the one they were connected to is down.
>
> The only alternative we have in mind to try to solve the ping problem is
> to keep on investigating if changing the ping task creation could be a
> solution (the changes implemented are clearly breaking something, so the
> solution is not complete yet).
>
>
>
>
>
>
> ________________________________
> De: Dan Smith <dsm...@pivotal.io>
> Enviado: jueves, 5 de marzo de 2020 21:03
> Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
> Cc: dev@geode.apache.org <dev@geode.apache.org>; Bruce Schuchardt <
> bschucha...@pivotal.io>; Jacob Barrett <jbarr...@pivotal.io>; Anilkumar
> Gingade <aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io>
> Asunto: Re: WAN replication issue in cloud native environments
>
> I think there is some confusion here.
>
> The client side class ExecutablePool has a method called
> setServerAffinityLocation. It looks like that is used for some internal
> transaction code to make sure transactions go to the same server. I don't
> think it makes any sense for the gateway to be messing with this setting.
>
> What I was talking about was session affinity in your proxy server. For
> example, if you are using k8s, session affinity as defined in this page -
> https://kubernetes.io/docs/concepts/services-networking/service/
>
> "If you want to make sure that connections from a particular client are
> passed to the same Pod each time, you can select the session affinity based
> on the client’s IP addresses by setting service.spec.sessionAffinity to
> “ClientIP” (the default is “None”)"
>
> I think setting session affinity might help your use case, because it
> sounds like you are having issues with the proxy directing pings to a
> different server than the data.
>
> -Dan
>
> On Thu, Mar 5, 2020 at 4:20 AM Alberto Bustamante Reyes
> <alberto.bustamante.re...@est.tech> wrote:
> I think that was what I did when I tried, but I realized I had a failure
> in the code. Now that I have tried again, reverting the change of executing
> ping by endpoint, and applying the server affinity, the connections are
> much more stable! Looks promising 🙂
>
> I suppose that if I want to introduce this change, setting the server
> affinity in the gateway sender should be introduced as a new option in the
> sender configuration, right?
> ________________________________
> De: Dan Smith <dsm...@pivotal.io<mailto:dsm...@pivotal.io>>
> Enviado: jueves, 5 de marzo de 2020 4:41
> Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
> Cc: dev@geode.apache.org<mailto:dev@geode.apache.org> <
> dev@geode.apache.org<mailto:dev@geode.apache.org>>; Bruce Schuchardt <
> bschucha...@pivotal.io<mailto:bschucha...@pivotal.io>>; Jacob Barrett <
> jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>; Anilkumar Gingade <
> aging...@pivotal.io<mailto:aging...@pivotal.io>>; Charlie Black <
> cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
> Asunto: Re: WAN replication issue in cloud native environments
>
> Oh, sorry, I meant server affinity with the proxy itself. So that it will
> always route traffic from the same gateway sender to the same gateway
> receiver. Hopefully that would ensure that pings go to the same receiver
> data is sent to.
>
> -Dan
>
> On Wed, Mar 4, 2020, 1:31 AM Alberto Bustamante Reyes
> <alberto.bustamante.re...@est.tech> wrote:
> I have tried setting the server affinity on the gateway sender's pool in
> AbstractGatewaySender class, when the server location is set, but I dont
> see any difference on the behavior of the connections.
>
> I did not mention that the connections are reset every 5 seconds due to
> "java.io.EOFException: The connection has been reset while reading the
> header". But I dont know yet what is causing it.
>
> ________________________________
> De: Dan Smith <dsm...@pivotal.io<mailto:dsm...@pivotal.io>>
> Enviado: martes, 3 de marzo de 2020 18:07
> Para: dev@geode.apache.org<mailto:dev@geode.apache.org> <
> dev@geode.apache.org<mailto:dev@geode.apache.org>>
> Cc: Bruce Schuchardt <bschucha...@pivotal.io<mailto:bschucha...@pivotal.io>>;
> Jacob Barrett <jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>;
> Anilkumar Gingade <aging...@pivotal.io<mailto:aging...@pivotal.io>>;
> Charlie Black <cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
> Asunto: Re: WAN replication issue in cloud native environments
>
> > We are currently working on other issue related to this change: gw
> senders pings are not reaching the gw receivers, so ClientHealthMonitor
> closes the connections. I saw that the ping tasks are created by
> ServerLocation, so I have tried to solve the issue by changing it to be
> done by Endpoint. This change is not finished yet, as in its current status
> it causes the closing of connections from gw servers to gw receivers every
> 5 seconds.
>
> Are you using session affinity? I think you probably will need to since
> pings can go over different connections than the data connection.
>
> -Dan
>
> On Tue, Mar 3, 2020 at 3:44 AM Alberto Bustamante Reyes
> <alberto.bustamante.re...@est.tech> wrote:
>
> > Hi Bruce,
> >
> > Thanks for your comments, but we are not planning to use TLS, so Im
> afraid
> > the PR you are working on will not solve this problem.
> >
> > The origin of this issue is that we would like to be able to configure
> all
> > gw receivers with the same "hostname-for-senders" value. The reason is
> that
> > we will run a multisite Geode cluster, having each site on a different
> > cloud environment, so using just one hostname makes configuration much
> more
> > easier.
> >
> > When we tried to configure the cluster in this way, we experienced an
> > issue with the replication. Using the same hostname-for-senders parameter
> > causes that different servers have equals ServerLocation objects, so if
> one
> > receiver is down, the others are considered down too. With the change
> > suggested by Jacob this problem is solved, and replication works fine.
> >
> > We are currently working on other issue related to this change: gw
> senders
> > pings are not reaching the gw receivers, so ClientHealthMonitor closes
> the
> > connections. I saw that the ping tasks are created by ServerLocation, so
> I
> > have tried to solve the issue by changing it to be done by Endpoint. This
> > change is not finished yet, as in its current status it causes the
> closing
> > of connections from gw servers to gw receivers every 5 seconds.
> >
> > Why you dont like the idea of using the InternalDistributedMember for
> > distinguish server locations? Are you thinking about other alternative?
> In
> > this use case, two different gw receivers will have the same
> > ServerLocation, so we need to distinguish them.
> >
> > BR/
> >
> > Alberto B.
> >
> > ________________________________
> > De: Bruce Schuchardt <bschucha...@pivotal.io<mailto:
> bschucha...@pivotal.io>>
> > Enviado: lunes, 2 de marzo de 2020 20:20
> > Para: dev@geode.apache.org<mailto:dev@geode.apache.org> <
> dev@geode.apache.org<mailto:dev@geode.apache.org>>; Jacob Barrett <
> > jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>
> > Cc: Anilkumar Gingade <aging...@pivotal.io<mailto:aging...@pivotal.io>>;
> Charlie Black <
> > cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
> > Asunto: Re: WAN replication issue in cloud native environments
> >
> > I'm coming to this conversation late and probably am missing a lot of
> > context.  Is the point of this to be to direct senders to some common
> > gateway that all of the gateway receivers are configured to advertise?
> > I've been working on a PR to support redirection of connections for
> > client/server and gateway communications to a common address and put the
> > destination host name in the SNIHostName TLS parameter.  Then you won't
> > have to tell servers about the common host name - just tell clients what
> > the gateway is and they'll connect to it & tell it what the target host
> > name is via the SNIHostName.  However, that only works if SSL is enabled.
> >
> > PR 4743 is a step toward this approach and changes TcpClient and
> > SocketCreator to take an unresolved host address.  After this is merged
> > another change will allow folks to set a gateway host/port that will be
> > used to form connections and insert the destination hostname into the
> > SNIHostName SSLParameter.
> >
> > I would really like us to avoid including InternalDistributedMembers in
> > equality checks for server-locations.  To-date we've only held these
> > identifiers in Endpoints and other places for debugging purposes and have
> > used ServerLocation to identify servers.
> >
> > On 1/27/20, 8:56 AM, "Alberto Bustamante Reyes"
> > <alberto.bustamante.re...@est.tech> wrote:
> >
> >     Hi again,
> >
> >     Status update: the simplification of the maps suggested by Jacob made
> > useless the new proposed class containing the ServerLocation and the
> member
> > id. With this refactoring, replication is working in the scenario we have
> > been discussing in this conversation. Thats great, and I think the code
> can
> > be merged into develop if there are no extra comments in the PR.
> >
> >     But this does not mean we can say that Geode is able to work properly
> > when using gw receivers with the same ip + port. We have seen that when
> > working with this configuration, there is a problem with the pings sent
> > from gw senders (that acts as clients) to the gw receivers (servers). The
> > pings are reaching just one of the receivers, so the sender-receiver
> > connection is finally closed by the ClientHealthMonitor.
> >
> >     Do you have any suggestion about how to handle this issue? My first
> > idea was to identify where the connection is created, to check if the
> > sender could be aware in some way there are more than one server to which
> > the ping should be sent, but Im not sure if it could be possible. Or if
> the
> > alternative could be to change the ClientHealthMonitor to be "clever"
> > enough to not close connections in this case. Any comment is welcome 🙂
> >
> >     Thanks,
> >
> >     Alberto B.
> >
> >     ________________________________
> >     De: Jacob Barrett <jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>
> >     Enviado: miércoles, 22 de enero de 2020 19:01
> >     Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
> >     Cc: dev@geode.apache.org<mailto:dev@geode.apache.org> <
> dev@geode.apache.org<mailto:dev@geode.apache.org>>; Anilkumar Gingade <
> > aging...@pivotal.io<mailto:aging...@pivotal.io>>; Charlie Black <
> cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
> >     Asunto: Re: WAN replication issue in cloud native environments
> >
> >
> >
> >     On Jan 22, 2020, at 9:51 AM, Alberto Bustamante Reyes
> > <alberto.bustamante.re...@est.tech<mailto:
> > alberto.bustamante.re...@est.tech>> wrote:
> >
> >     Thanks Naba & Jacob for your comments!
> >
> >
> >
> >     @Naba: I have been implementing a solution as you suggested, and I
> > think it would be convenient if the client knows the memberId of the
> server
> > it is connected to.
> >
> >     (current code is here: https://github.com/apache/geode/pull/4616 )
> >
> >     For example, in:
> >
> >     LocatorLoadSnapshot::getReplacementServerForConnection(ServerLocation
> > currentServer, String group, Set<ServerLocation> excludedServers)
> >
> >     In this method, client has sent the ServerLocation , but if that
> > object does not contain the memberId, I dont see how to guarantee that
> the
> > replacement that will be returned is not the same server the client is
> > currently connected.
> >     Inside that method, this other method is called:
> >
> >
> >     Given that your setup is masquerading multiple members behind the
> same
> > host and port (ServerLocation) it doesn’t matter. When the pool opens a
> new
> > socket to the replacement server it will be to the shared hostname and
> port
> > and the Kubenetes service at that host and port will just pick a backend
> > host. In the solution we suggested we preserved that behavior since the
> k8s
> > service can’t determine which backend member to route the connection to
> > based on the member id.
> >
> >
> >     LocatorLoadSnapshot::isCurrentServerMostLoaded(currentServer,
> > groupServers)
> >
> >     where groupServers is a "Map<ServerLocationAndMemberId, LoadHolder>"
> > object. If the keys of that map have the same host and port, they are
> only
> > different on the memberId. But as you dont know it (you just have
> > currentServer which contains host and port), you cannot get the correct
> > LoadHolder value, so you cannot know if your server is the most loaded.
> >
> >     Again, given your use case the behavior of this method is lost when a
> > new connection is establish by the pool through the shared hostname
> anyway.
> >
> >     @Jacob: I think the solution finally implies that client have to know
> > the memberId, I think we could simplify the maps.
> >
> >     The client isn’t keeping these load maps, the locator is, and the
> > locator knows all the member ids. The client end only needs to know the
> > host/port combination. In your example where the wan replication (a
> client
> > to the remote cluster) connects to the shared host/port service and get
> > randomly routed to one of the backend servers in that service.
> >
> >     All of this locator balancing code is unnecessarily in this model
> > where something else is choosing the final destination. The goal of our
> > proposed changes was to recognize that all we need is to make sure the
> > locator keeps the shared ServerLocation alive in its responses to clients
> > by tracking the members associated and reducing that set to the set of
> unit
> > ServerLocations. In your case that will always reduce to 1 ServerLocation
> > for N number of members, as long as 1 member is still up.
> >
> >     -Jake
> >
> >
> >
> >
> >
> >
>
>

Reply via email to