Looks like the problem is in QueueManagerImpl initialization so my suggestions won't fix the problem.
Caused by: org.apache.geode.cache.NoSubscriptionServersAvailableException: Could not initialize a primary queue on startup. No queue servers available. at org.apache.geode.cache.client.internal.QueueManagerImpl.initializeConnections(QueueManagerImpl.java:585) at org.apache.geode.cache.client.internal.QueueManagerImpl.start(QueueManagerImpl.java:296) at org.apache.geode.cache.client.internal.PoolImpl.start(PoolImpl.java:347) at org.apache.geode.cache.client.internal.PoolImpl.finishCreate(PoolImpl.java:172) at org.apache.geode.cache.client.internal.PoolImpl.create(PoolImpl.java:158) at org.apache.geode.internal.cache.PoolFactoryImpl.create(PoolFactoryImpl.java:346) at org.apache.geode.internal.cache.InitializedDiskRegionWithIoExceptionRegressionTest.createClientCache(InitializedDiskRegionWithIoExceptionRegressionTest.java:164) ... 1 more That's pretty clearly in QueueManagerImpl. My test InitializedDiskRegionWithIoExceptionRegressionTest does not actually require subscription queues so I can easily make this problem go away by not using a subscription queue. However, there are plenty of other tests that are intermittently failing with *NoSubscriptionServersAvailableException*: https://issues.apache.org/jira/issues/?jql=text%20~%20%22NoSubscriptionServersAvailableException%22 Randomly looking at one that was closed as fixed shows that the test is not actually fixed and can still fail with NoSubscriptionServersAvailableException. Look at the last comment of https://issues.apache.org/jira/browse/GEODE-3749 and you'll see that the test still fails intermittently. I'm not going to work on this further, but I'm happy to help or provide ideas to anyone that wants to solve the issue (this isn't related to what I'm currently working on so I'm done with it). On Thu, May 3, 2018 at 11:26 AM, Kirk Lund <kl...@apache.org> wrote: > The method CacheServer.isRunning() is actually invoked from within the > AcceptorImpl.accept(): > > while (isRunning()) { > ... > try { > socket = serverSock.accept(); > > So we can't modify isRunning() to check serverSock.isBound(). We would > have to introduce a new User API on CacheServer called isBound() or > something like that: > > org.apache.geode.cache.server.CacheServer > > */*** > * * Returns true if this cache server is bound to its port and accepting > connections.* > * */* > *boolean isBound();* > > Any objections or suggestions? > > On Thu, May 3, 2018 at 11:03 AM, Kirk Lund <kl...@apache.org> wrote: > >> Anil and I were hoping that adding line 4 below would remove the race, >> but it doesn't. CacheServer.isRunning() returns true as soon as >> AcceptorImpl is non-null which is still before ServerSocket.accept() has >> been invoked. The race still exists with this... >> >> 1: CacheServer cacheServer = getCache().addCacheServer(); >> 2: cacheServer.setPort(0); >> 3: cacheServer.start(); >> *4: await().atMost(1, MINUTES).until(() -> cacheServer.isRunning());* >> 5: return cacheServer.getPort(); >> >> I think we would have to change the implementation of >> CacheServer.isRunning() to check ServerSocket.isBound(): >> >> CacheServer.isRunning: >> public boolean isRunning() { >> return this.acceptor != null && this.acceptor.isRunning(); >> } >> >> AcceptorImpl.isRunning: >> public boolean isRunning() { >> return !this.shutdownStarted; >> } >> >> Would need to change to something like this: >> public boolean isRunning() { >> return !this.shutdownStarted *&& serverSock.isBound();* >> } >> >> Any opinions or alternatives? If I add "*&& serverSock.isBound();*" am I >> going to break isRunning? >> >> On Thu, May 3, 2018 at 9:33 AM, Kirk Lund <kl...@apache.org> wrote: >> >>> I have a test which starts a server and then starts a client. But the >>> client intermittently fails with NoSubscriptionServersAvailableException >>> (see full stack below). >>> >>> Seems like there must be something asynchronous in the startup of a >>> CacheServer that I need to wait for. Any ideas what I need to test for to >>> avoid NoSubscriptionServersAvailableException? >>> >>> org.apache.geode.internal.cache.InitializedDiskRegionWithIoExceptionRegressionTest >>> > cacheServerPersistWithIOExceptionShouldShutdown FAILED >>> org.apache.geode.test.dunit.RMIException: While invoking >>> org.apache.geode.internal.cache.InitializedDiskRegionWithIoE >>> xceptionRegressionTest$$Lambda$23/1222369873.run in VM 1 running on >>> Host 0b1780a0efc9 with 4 VMs >>> at org.apache.geode.test.dunit.VM.invoke(VM.java:436) >>> at org.apache.geode.test.dunit.VM.invoke(VM.java:405) >>> at org.apache.geode.test.dunit.VM.invoke(VM.java:348) >>> at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE >>> xceptionRegressionTest.cacheServerPersistWithIOExceptionShou >>> ldShutdown(InitializedDiskRegionWithIoExceptionRegressionTest.java:113) >>> >>> Caused by: >>> org.apache.geode.cache.NoSubscriptionServersAvailableException: >>> org.apache.geode.cache.NoSubscriptionServersAvailableException: Could >>> not initialize a primary queue on startup. No queue servers available. >>> at org.apache.geode.cache.client.internal.QueueManagerImpl.getA >>> llConnections(QueueManagerImpl.java:187) >>> at org.apache.geode.cache.client.internal.OpExecutorImpl.execut >>> eOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:539) >>> at org.apache.geode.cache.client.internal.PoolImpl.executeOnQue >>> uesAndReturnPrimaryResult(PoolImpl.java:850) >>> at org.apache.geode.cache.client.internal.RegisterInterestOp.ex >>> ecute(RegisterInterestOp.java:58) >>> at org.apache.geode.cache.client.internal.ServerRegionProxy.reg >>> isterInterest(ServerRegionProxy.java:356) >>> at org.apache.geode.internal.cache.LocalRegion.processSingleInt >>> erest(LocalRegion.java:3749) >>> at org.apache.geode.internal.cache.LocalRegion.registerInterest >>> (LocalRegion.java:3840) >>> at org.apache.geode.internal.cache.LocalRegion.registerInterest >>> (LocalRegion.java:3638) >>> at org.apache.geode.internal.cache.LocalRegion.registerInterest >>> (LocalRegion.java:3633) >>> at org.apache.geode.internal.cache.LocalRegion.registerInterest >>> (LocalRegion.java:3628) >>> at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE >>> xceptionRegressionTest.createClientCache(InitializedDiskRegi >>> onWithIoExceptionRegressionTest.java:172) >>> at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE >>> xceptionRegressionTest.lambda$cacheServerPersistWithIOExcept >>> ionShouldShutdown$2c6907a2$1(InitializedDiskRegionWithIoExce >>> ptionRegressionTest.java:113) >>> >>> Caused by: >>> org.apache.geode.cache.NoSubscriptionServersAvailableException: Could >>> not initialize a primary queue on startup. No queue servers available. >>> at org.apache.geode.cache.client.internal.QueueManagerImpl.init >>> ializeConnections(QueueManagerImpl.java:585) >>> at org.apache.geode.cache.client.internal.QueueManagerImpl.star >>> t(QueueManagerImpl.java:296) >>> at org.apache.geode.cache.client.internal.PoolImpl.start(PoolIm >>> pl.java:347) >>> at org.apache.geode.cache.client.internal.PoolImpl.finishCreate >>> (PoolImpl.java:172) >>> at org.apache.geode.cache.client.internal.PoolImpl.create(PoolI >>> mpl.java:158) >>> at org.apache.geode.internal.cache.PoolFactoryImpl.create(PoolF >>> actoryImpl.java:346) >>> at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE >>> xceptionRegressionTest.createClientCache(InitializedDiskRegi >>> onWithIoExceptionRegressionTest.java:164) >>> ... 1 more >>> >>> >> >