[ https://issues.apache.org/jira/browse/GEODE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424732#comment-17424732 ]
Barrett Oglesby commented on GEODE-9664: ---------------------------------------- I checked the behavior of the client if there are no servers. I was thinking that these durable client scenarios should behave similar to the no servers scenario. When the Pool is created, the QueueManagerImpl.initializeConnections attempts to create the connections. If there are no servers, the ConnectionList's primaryDiscoveryException is initialized like: {noformat} List<ServerLocation> servers = findQueueServers(excludedServers, queuesNeeded, true, false, null); if (servers == null || servers.isEmpty()) { scheduleRedundancySatisfierIfNeeded(redundancyRetryInterval); synchronized (lock) { queueConnections = queueConnections.setPrimaryDiscoveryFailed(null); lock.notifyAll(); } return; } {noformat} And the empty ConnectionList is created here: {noformat} java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1333) at org.apache.geode.cache.client.internal.QueueManagerImpl$ConnectionList.<init>(QueueManagerImpl.java:1318) at org.apache.geode.cache.client.internal.QueueManagerImpl$ConnectionList.setPrimaryDiscoveryFailed(QueueManagerImpl.java:1337) at org.apache.geode.cache.client.internal.QueueManagerImpl.initializeConnections(QueueManagerImpl.java:439) at org.apache.geode.cache.client.internal.QueueManagerImpl.start(QueueManagerImpl.java:293) at org.apache.geode.cache.client.internal.PoolImpl.start(PoolImpl.java:359) at org.apache.geode.cache.client.internal.PoolImpl.finishCreate(PoolImpl.java:183) at org.apache.geode.cache.client.internal.PoolImpl.create(PoolImpl.java:169) at org.apache.geode.internal.cache.PoolFactoryImpl.create(PoolFactoryImpl.java:378) {noformat} Then when Region.registerInterestForAllKeys is called, it invokes ServerRegionProxy.registerInterest which: - adds the key to the RegisterInterestTracker - executes the RegisterInterestOp - removed from key from the RegisterInterestTracker if the RegisterInterestOp fails Here is the code in Region.registerInterestForAllKeys that does the above steps: {noformat} try { rit.addSingleInterest(region, key, interestType, policy, isDurable, receiveUpdatesAsInvalidates); result = RegisterInterestOp.execute(pool, regionName, key, interestType, policy, isDurable, receiveUpdatesAsInvalidates, regionDataPolicy); finished = true; return result; } finally { if (!finished) { rit.removeSingleInterest(region, key, interestType, isDurable, receiveUpdatesAsInvalidates); } } {noformat} The Connections are retrieved in QueueManagerImpl.getAllConnections. If there are none, a NoSubscriptionServersAvailableException wrapping the primaryDiscoveryException is thrown: {noformat} Exception in thread "main" org.apache.geode.cache.NoSubscriptionServersAvailableException: org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary discovery failed. at org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:191) at org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:428) at org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:875) at org.apache.geode.cache.client.internal.RegisterInterestOp.execute(RegisterInterestOp.java:58) at org.apache.geode.cache.client.internal.ServerRegionProxy.registerInterest(ServerRegionProxy.java:364) at org.apache.geode.internal.cache.LocalRegion.processSingleInterest(LocalRegion.java:3815) at org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3911) at org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3890) at org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3885) at org.apache.geode.cache.Region.registerInterestForAllKeys(Region.java:1709) {noformat} Here is logging that shows all this behavior: {noformat} [warn 2021/10/04 10:58:22.184 PDT client-a-1 <main> tid=0x1] XXX ConnectionList.<init> primaryDiscoveryException=org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary discovery failed. [warn 2021/10/04 10:58:22.238 PDT client-a-1 <main> tid=0x1] XXX RegisterInterestTracker.addSingleInterest key=.*; rieInterests={.*=KEYS_VALUES} [warn 2021/10/04 10:58:22.238 PDT client-a-1 <main> tid=0x1] XXX ServerRegionProxy.registerInterest about to execute RegisterInterestOp [warn 2021/10/04 10:58:22.244 PDT client-a-1 <main> tid=0x1] XXX QueueManagerImpl.getAllConnections about to throw exception=org.apache.geode.cache.NoSubscriptionServersAvailableException: org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary discovery failed. [warn 2021/10/04 10:58:22.244 PDT client-a-1 <main> tid=0x1] XXX RegisterInterestTracker.removeSingleInterest key=.*; rieInterests={.*=KEYS_VALUES} {noformat} If the NoSubscriptionServersAvailableException is not caught, the client exits. If the NoSubscriptionServersAvailableException is caught and the client waits, the connection is established by the RedundancySatisfierTask when the server starts: {noformat} java.lang.Exception at org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.<init>(CacheClientUpdater.java:373) at org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.<init>(CacheClientUpdater.java:281) at org.apache.geode.cache.client.internal.ConnectionConnector.connectServerToClient(ConnectionConnector.java:105) at org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createServerToClientConnection(ConnectionFactoryImpl.java:251) at org.apache.geode.cache.client.internal.QueueManagerImpl.initializeQueueConnection(QueueManagerImpl.java:960) at org.apache.geode.cache.client.internal.QueueManagerImpl.createNewPrimary(QueueManagerImpl.java:803) at org.apache.geode.cache.client.internal.QueueManagerImpl.recoverPrimary(QueueManagerImpl.java:911) at org.apache.geode.cache.client.internal.QueueManagerImpl.access$700(QueueManagerImpl.java:77) at org.apache.geode.cache.client.internal.QueueManagerImpl$RedundancySatisfierTask.run2(QueueManagerImpl.java:1449) {noformat} But since the key was removed from the RegisterInterestTracker, no registerInterest is created. If I change my test to loop forever attempting to registerInterest, I see the behavior I want when the server starts. {noformat} private void registerInterest(String regionName, InterestResultPolicy policy) throws Exception { Region region = this.cache.getRegion(regionName); while (true) { try { region.registerInterestForAllKeys(); System.out.println("Registered interest in all keys"); break; } catch (Exception e) { System.out.println("Caught the following exception attempting to registerInterestForAllKeys: " + e); Thread.sleep(2000); } } // Invoke readyForEvents ((ClientCache) this.cache).readyForEvents(); } {noformat} The client attempts to registerInterest until the server starts. Once the server does start, the client registers interest and sends readyForEvents. At that time, it starts receiving events. > Two different clients with the same durable id will both connect to the > servers and receive messages > ---------------------------------------------------------------------------------------------------- > > Key: GEODE-9664 > URL: https://issues.apache.org/jira/browse/GEODE-9664 > Project: Geode > Issue Type: Bug > Components: client queues > Reporter: Barrett Oglesby > Priority: Major > Labels: needsTriage > > There are two cases: > # The number of queues is the same as the number of servers (e.g. client > with subscription-redundancy=1 and 2 servers) > # The number of queues is less than the number of servers (e.g. client with > subscription-redundancy=0 and 2 servers) > h2. Case 1 > In this case, the client first attempts to connect to the primary and fails. > {noformat} > [warn 2021/10/01 14:37:56.209 PDT server-1 <Client Queue Initialization > Thread 1> tid=0x4b] XXX CacheClientNotifier.registerClientInternal about to > register > clientProxyMembershipID=identity(127.0.0.1(client-a-2:89832:loner):61596:fad3ca3d:client-a-2,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]) > [warn 2021/10/01 14:37:56.209 PDT server-1 <Client Queue Initialization > Thread 1> tid=0x4b] XXX CacheClientNotifier.registerClientInternal existing > proxy=CacheClientProxy[identity(127.0.0.1(client-a-1:89806:loner):61573:10a9ca3d:client-a-1,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]); port=61581; primary=true; version=GEODE 1.15.0] > [warn 2021/10/01 14:37:56.210 PDT server-1 <Client Queue Initialization > Thread 1> tid=0x4b] XXX CacheClientNotifier.registerClientInternal existing > proxy isPaused=false > [warn 2021/10/01 14:37:56.210 PDT server-1 <Client Queue Initialization > Thread 1> tid=0x4b] The requested durable client has the same identifier ( > client-a ) as an existing durable client ( > CacheClientProxy[identity(127.0.0.1(client-a-1:89806:loner):61573:10a9ca3d:client-a-1,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]); port=61581; primary=true; version=GEODE 1.15.0] ). Duplicate > durable clients are not allowed. > [warn 2021/10/01 14:37:56.210 PDT server-1 <Client Queue Initialization > Thread 1> tid=0x4b] CacheClientNotifier: Unsuccessfully registered client > with identifier > identity(127.0.0.1(client-a-2:89832:loner):61596:fad3ca3d:client-a-2,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]) and response code 64 > {noformat} > It then attempts to connect to the secondary and succeeds. > {noformat} > [warn 2021/10/01 14:37:56.215 PDT server-2 <Client Queue Initialization > Thread 1> tid=0x47] XXX CacheClientNotifier.registerClientInternal about to > register > clientProxyMembershipID=identity(127.0.0.1(client-a-2:89832:loner):61596:fad3ca3d:client-a-2,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]) > [warn 2021/10/01 14:37:56.215 PDT server-2 <Client Queue Initialization > Thread 1> tid=0x47] XXX CacheClientNotifier.registerClientInternal existing > proxy=CacheClientProxy[identity(127.0.0.1(client-a-1:89806:loner):61573:10a9ca3d:client-a-1,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]); port=61578; primary=false; version=GEODE 1.15.0] > [warn 2021/10/01 14:37:56.216 PDT server-2 <Client Queue Initialization > Thread 1> tid=0x47] XXX CacheClientNotifier.registerClientInternal existing > proxy isPaused=true > [warn 2021/10/01 14:37:56.217 PDT server-2 <Client Queue Initialization > Thread 1> tid=0x47] XXX CacheClientNotifier.registerClientInternal > reinitialized existing > proxy=CacheClientProxy[identity(127.0.0.1(client-a-1:89806:loner):61573:10a9ca3d:client-a-1,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]); port=61578; primary=true; version=GEODE 1.15.0] > {noformat} > The previous secondary is reinitialized and made into a primary. Both queues > will dispatch events. > The CacheClientNotifier.registerClientInternal method invoked when a client > connects does: > {noformat} > if (cacheClientProxy.isPaused()) { > ... > cacheClientProxy.reinitialize(...); > } else { > unsuccessfulMsg = String.format("The requested durable client has the same > identifier ( %s ) as an existing durable client...); > logger.warn(unsuccessfulMsg); > } > {noformat} > The CacheClientProxy is paused when the durable client it represents has > disconnected. Unfortunately, a secondary CacheClientProxy is also paused. So, > this check is not good enough to prevent a duplicate durable client from > connecting. > There are a few things that can also be checked. One of them is: > {noformat} > cacheClientProxy.getCommBuffer() == null > {noformat} > With that check added, when the client attempts to connect to the secondary, > it fails just like the it does with the primary. > The client then exits with this exception: > {noformat} > geode.cache.NoSubscriptionServersAvailableException: > org.apache.geode.cache.NoSubscriptionServersAvailableException: Could not > initialize a primary queue on startup. No queue servers available. > at > org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:191) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:428) > at > org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:875) > at > org.apache.geode.cache.client.internal.RegisterInterestOp.execute(RegisterInterestOp.java:58) > at > org.apache.geode.cache.client.internal.ServerRegionProxy.registerInterest(ServerRegionProxy.java:364) > at > org.apache.geode.internal.cache.LocalRegion.processSingleInterest(LocalRegion.java:3815) > at > org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3911) > at > org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3890) > at > org.apache.geode.internal.cache.LocalRegion.registerInterestRegex(LocalRegion.java:3885) > at > org.apache.geode.cache.Region.registerInterestForAllKeys(Region.java:1709) > Caused by: org.apache.geode.cache.NoSubscriptionServersAvailableException: > Could not initialize a primary queue on startup. No queue servers available. > at > org.apache.geode.cache.client.internal.QueueManagerImpl.initializeConnections(QueueManagerImpl.java:575) > at > org.apache.geode.cache.client.internal.QueueManagerImpl.start(QueueManagerImpl.java:293) > at > org.apache.geode.cache.client.internal.PoolImpl.start(PoolImpl.java:359) > at > org.apache.geode.cache.client.internal.PoolImpl.finishCreate(PoolImpl.java:183) > at > org.apache.geode.cache.client.internal.PoolImpl.create(PoolImpl.java:169) > at > org.apache.geode.internal.cache.PoolFactoryImpl.create(PoolFactoryImpl.java:378) > {noformat} > h2. Case 2 > In this case, the client first attempts to connect to the primary and fails > just like case 1. > It will then attempt to connect a server with no existing queue and succeed. > {noformat} > [warn 2021/10/01 15:02:50.798 PDT server-1 <Client Queue Initialization > Thread 1> tid=0x54] XXX CacheClientNotifier.registerClientInternal about to > register > clientProxyMembershipID=identity(127.0.0.1(client-a-2:91683:loner):62089:24a2e13d:client-a-2,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]) > [warn 2021/10/01 15:02:50.799 PDT server-1 <Client Queue Initialization > Thread 1> tid=0x54] XXX CacheClientNotifier.registerClientInternal existing > proxy=null > [warn 2021/10/01 15:02:50.810 PDT server-1 <Client Queue Initialization > Thread 1> tid=0x54] XXX CacheClientNotifier.registerClientInternal created > proxy=CacheClientProxy[identity(127.0.0.1(client-a-2:91683:loner):62089:24a2e13d:client-a-2,connection=2,durableAttributes=DurableClientAttributes[id=client-a; > timeout=300]); port=62094; primary=true; version=GEODE 1.15.0] > {noformat} > One way to address this case would be to prevent the durable client from > retrying to another server if it can't connect to the primary. > That would have be addressed in QueueManagerImpl.initializeConnections. That > method would have to know that the server refused the connection > (ServerRefusedConnectionException) and then return out of that method. > Thats a bit more work since that method currently doesn't get any exceptions > from initializeQueueConnection which does: > {noformat} > } catch (Exception e) { > if (logger.isDebugEnabled()) { > logger.debug("error creating subscription connection to server {}", > connection.getEndpoint(), e); > } > } > {noformat} > An exception handler like this would need to be added: > {noformat} > } catch (ServerRefusedConnectionException e) { > throw e; > } > {noformat} > QueueManagerImpl.initializeConnections would have to handle that exception in > a few places. I'm not sure exactly what should be done in that method. -- This message was sent by Atlassian Jira (v8.3.4#803005)