[ 
https://issues.apache.org/jira/browse/GEODE-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-10056:
-----------------------------------
    Summary: Gateway-reciver connection load mantained only on one locator  
(was: Gateway-reciver load mantained only on one locator)

> Gateway-reciver connection load mantained only on one locator
> -------------------------------------------------------------
>
>                 Key: GEODE-10056
>                 URL: https://issues.apache.org/jira/browse/GEODE-10056
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>              Labels: needsTriage
>
> When GW sender wants to create connection to a receiver, it will ask remote 
> locator where to connect to (which server) using CLIENT_CONNECTION_REQUEST 
> message. Locator should check the load (actually just the connection count in 
> each GW receiver) and respond with least loaded server. 
> But, servers do not track the load for their GW receiver acceptor! It is 
> always 0. What happens then?
> It looks like each locator is mantaining a map of the load based on 
> connections it dealt around so there will be no unbalancing problems until 
> either locator restarts or clients get their connections from some other 
> locator in the cluster. Both are quite valid scenarios in my opinion and the 
> net-result is unbalance in replication connections.
> How to test?
> How to test?
> Start 2 clusters, Let's call site1 the sending and site2 the receiving site, 
> The receiving site should have at least 2 locators. Both have 2 servers. No 
> regions are needed.
> Cluster-1 gfsh>list members
> Member Count : 3Name | Id
> --------- | -------------------------------------------------------------
> locator10 | 10.0.2.15(locator10:7332:locator)<ec><v0>:41000 [Coordinator]
> server11 | 10.0.2.15(server11:8358)<v1>:41003
> server12 | 10.0.2.15(server12:8717)<v2>:41005
>  
> Cluster-2 gfsh>list members
> Member Count : 4Name | Id
> --------- | -------------------------------------------------------------
> locator10 | 10.0.2.15(locator10:7562:locator)<ec><v0>:41001 [Coordinator]
> locator11 | 10.0.2.15(locator11:8103:locator)<ec><v1>:41002
> server11 | 10.0.2.15(server11:8547)<v2>:41004
> server12 | 10.0.2.15(server12:8908)<v3>:41006
>  
> Create GW receiver in Site2 on both servers.
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | -----------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 0            |
> 10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |
> Create GW sender in Site1 on both servers. Use 10 dispatcher threads for 
> easier obervation. 
> Cluster-1 gfsh>list gateways
> GatewaySender SectionGatewaySender Id |               Member               | 
> Remote Cluster Id |   Type   |        Status         | Queued Events | 
> Receiver Location
> ---------------- | ---------------------------------- | ----------------- | 
> -------- | --------------------- | ------------- | -----------------
> senderTo2        | 10.0.2.15(server11:8358)<v1>:41003 | 2                 | 
> Parallel | Running and Connected | 0             | 10.0.2.15:5457
> senderTo2        | 10.0.2.15(server12:8717)<v2>:41005 | 2                 | 
> Parallel | Running and Connected | 0             | 10.0.2.15:5457
>  
> Observe balance in GW receiver connections in Site2. It will be perfect.
>  
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 12           | 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5457 | 12           | 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..
>  
> 12 connections each - 10 payload + 2 ping connections.
> Now stop GW receiver in one server of site2. In Site1 do a stop/start 
> gateway-sender command - all connections will go to the only receiver in 
> site2 (as expected). Check it:
>  
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 22           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |
>  
> Now 22 in just one receiver - 20 payload + 1 ping from each sender.
> Stop GW sender in one server in Site1. Connection drops in GW receiver to 
> half the value (also expected).
>  
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5457 | 0            |
> Now 11 as one sender from Site1 is stopped.
> Start the GW receiver in server of site2 (that was stopped before). It will 
> not receive new connections just yet.
> Start GW sender in one server in Site1 (that was stopped before). All 
> connections will land in receiver started before so the balance is there.
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5182 | 11           | 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..
> 11 connections in each because we have perfect mapping server11 to server11 
> and server12 to server12 (i.e. there is just 1 ping connection in each 
> receiver). As expected - we see how balance was achieved. Stop GW sender in 
> same server in Site1 again. Again, no connections in receiver of Site2 we 
> just started (expected).
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 11           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5182 | 0            |
> Now stop one locator in Site2 - the one that was serving GW senders - it was 
> locator10 in my case. Start GW sender in that server of Site1 again. Check 
> the balance in Site2 GW receiver:
> Cluster-2 gfsh>list gateways
> GatewayReceiver Section              Member               | Port | Sender 
> Count | Senders Connected
> ---------------------------------- | ---- | ------------ | 
> ---------------------------------------------------------------------------------------------------------------------------------
> 10.0.2.15(server11:8547)<v2>:41004 | 5175 | 17           | 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:8358)<v1>:41003, 
> 10.0.2.15(server11:8358)<v1>:41003, 10.0.2.15(server11:..
> 10.0.2.15(server12:8908)<v3>:41006 | 5182 | 6            | 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:8717)<v2>:41005, 
> 10.0.2.15(server12:8717)<v2>:41005, 10.0.2.15(server12:..
> As you can see in above printout, connections aren't balanced correctly when 
> connection request is sent to new locator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to