That smells like a bug to me.  We don't have to be in accept() though - we just need to have the serversocket open and with sufficient backlog to queue the connection requests.  Adding a check for whether the socket is open in isRunning() might be good enough.

On 5/3/18 11:26 AM, Kirk Lund wrote:
The method CacheServer.isRunning() is actually invoked from within the
AcceptorImpl.accept():

     while (isRunning()) {
       ...
       try {
         socket = serverSock.accept();

So we can't modify isRunning() to check serverSock.isBound(). We would have
to introduce a new User API on CacheServer called isBound() or something
like that:

org.apache.geode.cache.server.CacheServer

*/***
* * Returns true if this cache server is bound to its port and accepting
connections.*
* */*
*boolean isBound();*

Any objections or suggestions?

On Thu, May 3, 2018 at 11:03 AM, Kirk Lund <kl...@apache.org> wrote:

Anil and I were hoping that adding line 4 below would remove the race, but
it doesn't. CacheServer.isRunning() returns true as soon as AcceptorImpl is
non-null which is still before ServerSocket.accept() has been invoked. The
race still exists with this...

1: CacheServer cacheServer = getCache().addCacheServer();
2: cacheServer.setPort(0);
3: cacheServer.start();
*4: await().atMost(1, MINUTES).until(() -> cacheServer.isRunning());*
5: return cacheServer.getPort();

I think we would have to change the implementation of
CacheServer.isRunning() to check ServerSocket.isBound():

CacheServer.isRunning:
   public boolean isRunning() {
     return this.acceptor != null && this.acceptor.isRunning();
   }

AcceptorImpl.isRunning:
   public boolean isRunning() {
     return !this.shutdownStarted;
   }

Would need to change to something like this:
   public boolean isRunning() {
     return !this.shutdownStarted *&& serverSock.isBound();*
   }

Any opinions or alternatives? If I add "*&& serverSock.isBound();*" am I
going to break isRunning?

On Thu, May 3, 2018 at 9:33 AM, Kirk Lund <kl...@apache.org> wrote:

I have a test which starts a server and then starts a client. But the
client intermittently fails with NoSubscriptionServersAvailableException
(see full stack below).

Seems like there must be something asynchronous in the startup of a
CacheServer that I need to wait for. Any ideas what I need to test for to
avoid NoSubscriptionServersAvailableException?

org.apache.geode.internal.cache.InitializedDiskRegionWithIoExceptionRegressionTest
cacheServerPersistWithIOExceptionShouldShutdown FAILED
     org.apache.geode.test.dunit.RMIException: While invoking
org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
xceptionRegressionTest$$Lambda$23/1222369873.run in VM 1 running on Host
0b1780a0efc9 with 4 VMs
         at org.apache.geode.test.dunit.VM.invoke(VM.java:436)
         at org.apache.geode.test.dunit.VM.invoke(VM.java:405)
         at org.apache.geode.test.dunit.VM.invoke(VM.java:348)
         at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
xceptionRegressionTest.cacheServerPersistWithIOExceptionShou
ldShutdown(InitializedDiskRegionWithIoExceptionRegressionTest.java:113)

Caused by:
org.apache.geode.cache.NoSubscriptionServersAvailableException:
org.apache.geode.cache.NoSubscriptionServersAvailableException: Could
not initialize a primary queue on startup. No queue servers available.
     at org.apache.geode.cache.client.internal.QueueManagerImpl.getA
llConnections(QueueManagerImpl.java:187)
     at org.apache.geode.cache.client.internal.OpExecutorImpl.execut
eOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:539)
     at org.apache.geode.cache.client.internal.PoolImpl.executeOnQue
uesAndReturnPrimaryResult(PoolImpl.java:850)
     at org.apache.geode.cache.client.internal.RegisterInterestOp.ex
ecute(RegisterInterestOp.java:58)
     at org.apache.geode.cache.client.internal.ServerRegionProxy.reg
isterInterest(ServerRegionProxy.java:356)
     at org.apache.geode.internal.cache.LocalRegion.processSingleInt
erest(LocalRegion.java:3749)
     at org.apache.geode.internal.cache.LocalRegion.registerInterest
(LocalRegion.java:3840)
     at org.apache.geode.internal.cache.LocalRegion.registerInterest
(LocalRegion.java:3638)
     at org.apache.geode.internal.cache.LocalRegion.registerInterest
(LocalRegion.java:3633)
     at org.apache.geode.internal.cache.LocalRegion.registerInterest
(LocalRegion.java:3628)
     at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
xceptionRegressionTest.createClientCache(InitializedDiskRegi
onWithIoExceptionRegressionTest.java:172)
     at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
xceptionRegressionTest.lambda$cacheServerPersistWithIOExcept
ionShouldShutdown$2c6907a2$1(InitializedDiskRegionWithIoExce
ptionRegressionTest.java:113)

Caused by:
org.apache.geode.cache.NoSubscriptionServersAvailableException: Could
not initialize a primary queue on startup. No queue servers available.
     at org.apache.geode.cache.client.internal.QueueManagerImpl.init
ializeConnections(QueueManagerImpl.java:585)
     at org.apache.geode.cache.client.internal.QueueManagerImpl.star
t(QueueManagerImpl.java:296)
     at org.apache.geode.cache.client.internal.PoolImpl.start(PoolIm
pl.java:347)
     at org.apache.geode.cache.client.internal.PoolImpl.finishCreate
(PoolImpl.java:172)
     at org.apache.geode.cache.client.internal.PoolImpl.create(PoolI
mpl.java:158)
     at org.apache.geode.internal.cache.PoolFactoryImpl.create(PoolF
actoryImpl.java:346)
     at org.apache.geode.internal.cache.InitializedDiskRegionWithIoE
xceptionRegressionTest.createClientCache(InitializedDiskRegi
onWithIoExceptionRegressionTest.java:164)
     ... 1 more



Reply via email to