Looks like the problem is in QueueManagerImpl initialization so my
suggestions won't fix the problem.

Caused by:
org.apache.geode.cache.NoSubscriptionServersAvailableException: Could not
initialize a primary queue on startup. No queue servers available.
    at
org.apache.geode.cache.client.internal.QueueManagerImpl.initializeConnections(QueueManagerImpl.java:585)
    at
org.apache.geode.cache.client.internal.QueueManagerImpl.start(QueueManagerImpl.java:296)
    at
org.apache.geode.cache.client.internal.PoolImpl.start(PoolImpl.java:347)
    at
org.apache.geode.cache.client.internal.PoolImpl.finishCreate(PoolImpl.java:172)
    at
org.apache.geode.cache.client.internal.PoolImpl.create(PoolImpl.java:158)
    at
org.apache.geode.internal.cache.PoolFactoryImpl.create(PoolFactoryImpl.java:346)
    at
org.apache.geode.internal.cache.InitializedDiskRegionWithIoExceptionRegressionTest.createClientCache(InitializedDiskRegionWithIoExceptionRegressionTest.java:164)
    ... 1 more

That's pretty clearly in QueueManagerImpl. My test
InitializedDiskRegionWithIoExceptionRegressionTest does not actually
require subscription queues so I can easily make this problem go away by
not using a subscription queue.

However, there are plenty of other tests that are intermittently failing
with *NoSubscriptionServersAvailableException*:

https://issues.apache.org/jira/issues/?jql=text%20~%20%22NoSubscriptionServersAvailableException%22

Randomly looking at one that was closed as fixed shows that the test is not
actually fixed and can still fail with NoSubscriptionServersAvailableException.
Look at the last comment of https://issues.apache.org/jira/browse/GEODE-3749
and you'll see that the test still fails intermittently.

I'm not going to work on this further, but I'm happy to help or provide
ideas to anyone that wants to solve the issue (this isn't related to what
I'm currently working on so I'm done with it).


On Thu, May 3, 2018 at 11:26 AM, Kirk Lund <kl...@apache.org> wrote:

> The method CacheServer.isRunning() is actually invoked from within the
> AcceptorImpl.accept():
>
>     while (isRunning()) {
>       ...
>       try {
>         socket = serverSock.accept();
>
> So we can't modify isRunning() to check serverSock.isBound(). We would
> have to introduce a new User API on CacheServer called isBound() or
> something like that:
>
> org.apache.geode.cache.server.CacheServer
>
> */***
> * * Returns true if this cache server is bound to its port and accepting
> connections.*
> * */*
> *boolean isBound();*
>
> Any objections or suggestions?
>
> On Thu, May 3, 2018 at 11:03 AM, Kirk Lund <kl...@apache.org> wrote:
>
>> Anil and I were hoping that adding line 4 below would remove the race,
>> but it doesn't. CacheServer.isRunning() returns true as soon as
>> AcceptorImpl is non-null which is still before ServerSocket.accept() has
>> been invoked. The race still exists with this...
>>
>> 1: CacheServer cacheServer = getCache().addCacheServer();
>> 2: cacheServer.setPort(0);
>> 3: cacheServer.start();
>> *4: await().atMost(1, MINUTES).until(() -> cacheServer.isRunning());*
>> 5: return cacheServer.getPort();
>>
>> I think we would have to change the implementation of
>> CacheServer.isRunning() to check ServerSocket.isBound():
>>
>> CacheServer.isRunning:
>>   public boolean isRunning() {
>>     return this.acceptor != null && this.acceptor.isRunning();
>>   }
>>
>> AcceptorImpl.isRunning:
>>   public boolean isRunning() {
>>     return !this.shutdownStarted;
>>   }
>>
>> Would need to change to something like this:
>>   public boolean isRunning() {
>>     return !this.shutdownStarted *&& serverSock.isBound();*
>>   }
>>
>> Any opinions or alternatives? If I add "*&& serverSock.isBound();*" am I
>> going to break isRunning?
>>
>> On Thu, May 3, 2018 at 9:33 AM, Kirk Lund <kl...@apache.org> wrote:
>>
>>> I have a test which starts a server and then starts a client. But the
>>> client intermittently fails with NoSubscriptionServersAvailableException
>>> (see full stack below).
>>>
>>> Seems like there must be something asynchronous in the startup of a
>>> CacheServer that I need to wait for. Any ideas what I need to test for to
>>> avoid NoSubscriptionServersAvailableException?
>>>
>>> org.apache.geode.internal.cache.InitializedDiskRegionWithIoExceptionRegressionTest
>>> > cacheServerPersistWithIOExceptionShouldShutdown FAILED
>>>     org.apache.geode.test.dunit.RMIException: While invoking
>>> org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
>>> xceptionRegressionTest$$Lambda$23/1222369873.run in VM 1 running on
>>> Host 0b1780a0efc9 with 4 VMs
>>>         at org.apache.geode.test.dunit.VM.invoke(VM.java:436)
>>>         at org.apache.geode.test.dunit.VM.invoke(VM.java:405)
>>>         at org.apache.geode.test.dunit.VM.invoke(VM.java:348)
>>>         at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
>>> xceptionRegressionTest.cacheServerPersistWithIOExceptionShou
>>> ldShutdown(InitializedDiskRegionWithIoExceptionRegressionTest.java:113)
>>>
>>> Caused by:
>>> org.apache.geode.cache.NoSubscriptionServersAvailableException:
>>> org.apache.geode.cache.NoSubscriptionServersAvailableException: Could
>>> not initialize a primary queue on startup. No queue servers available.
>>>     at org.apache.geode.cache.client.internal.QueueManagerImpl.getA
>>> llConnections(QueueManagerImpl.java:187)
>>>     at org.apache.geode.cache.client.internal.OpExecutorImpl.execut
>>> eOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:539)
>>>     at org.apache.geode.cache.client.internal.PoolImpl.executeOnQue
>>> uesAndReturnPrimaryResult(PoolImpl.java:850)
>>>     at org.apache.geode.cache.client.internal.RegisterInterestOp.ex
>>> ecute(RegisterInterestOp.java:58)
>>>     at org.apache.geode.cache.client.internal.ServerRegionProxy.reg
>>> isterInterest(ServerRegionProxy.java:356)
>>>     at org.apache.geode.internal.cache.LocalRegion.processSingleInt
>>> erest(LocalRegion.java:3749)
>>>     at org.apache.geode.internal.cache.LocalRegion.registerInterest
>>> (LocalRegion.java:3840)
>>>     at org.apache.geode.internal.cache.LocalRegion.registerInterest
>>> (LocalRegion.java:3638)
>>>     at org.apache.geode.internal.cache.LocalRegion.registerInterest
>>> (LocalRegion.java:3633)
>>>     at org.apache.geode.internal.cache.LocalRegion.registerInterest
>>> (LocalRegion.java:3628)
>>>     at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
>>> xceptionRegressionTest.createClientCache(InitializedDiskRegi
>>> onWithIoExceptionRegressionTest.java:172)
>>>     at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
>>> xceptionRegressionTest.lambda$cacheServerPersistWithIOExcept
>>> ionShouldShutdown$2c6907a2$1(InitializedDiskRegionWithIoExce
>>> ptionRegressionTest.java:113)
>>>
>>> Caused by:
>>> org.apache.geode.cache.NoSubscriptionServersAvailableException: Could
>>> not initialize a primary queue on startup. No queue servers available.
>>>     at org.apache.geode.cache.client.internal.QueueManagerImpl.init
>>> ializeConnections(QueueManagerImpl.java:585)
>>>     at org.apache.geode.cache.client.internal.QueueManagerImpl.star
>>> t(QueueManagerImpl.java:296)
>>>     at org.apache.geode.cache.client.internal.PoolImpl.start(PoolIm
>>> pl.java:347)
>>>     at org.apache.geode.cache.client.internal.PoolImpl.finishCreate
>>> (PoolImpl.java:172)
>>>     at org.apache.geode.cache.client.internal.PoolImpl.create(PoolI
>>> mpl.java:158)
>>>     at org.apache.geode.internal.cache.PoolFactoryImpl.create(PoolF
>>> actoryImpl.java:346)
>>>     at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
>>> xceptionRegressionTest.createClientCache(InitializedDiskRegi
>>> onWithIoExceptionRegressionTest.java:164)
>>>     ... 1 more
>>>
>>>
>>
>

Reply via email to