[
https://issues.apache.org/jira/browse/GEODE-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319481#comment-16319481
]
Barry Oglesby commented on GEODE-4251:
--------------------------------------
I see whats happening in this test.
The registerInterest call times out:
{noformat}
[vm3] [warn 2018/01/09 15:49:13.919 PST <RMI TCP Connection(1)-127.0.0.1>
tid=0x13] Pool unexpected socket timed out on client
connection=SubscriptionConnectionImpl[localhost:27374:closed])
{noformat}
This causes causes the QueueConnectionImpl to be destroyed, which causes the
CacheClientUpdater to be closed:
{noformat}
[vm3] [info 2018/01/09 15:49:13.919 PST <Cache Client Updater Thread on
192.168.2.6(20159)<v1>:32770 port 27374> tid=0x3e] Cache client updater for
Queue on endpoint localhost:27374 exiting. Scheduling recovery.
{noformat}
Here is a stack trace of that area of the code. OpExecutorImpl.handleException
handles the SocketTimeoutException by destroying the QueueConnectionImpl. This
calls QueueManagerImpl.endpointCrashed which closes the CacheClientUpdater.
{noformat}
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.close(CacheClientUpdater.java:548)
at
org.apache.geode.cache.client.internal.QueueConnectionImpl.internalDestroy(QueueConnectionImpl.java:107)
at
org.apache.geode.cache.client.internal.QueueManagerImpl.endpointCrashed(QueueManagerImpl.java:392)
at
org.apache.geode.cache.client.internal.QueueManagerImpl.connectionCrashed(QueueManagerImpl.java:370)
at
org.apache.geode.cache.client.internal.QueueConnectionImpl.destroy(QueueConnectionImpl.java:83)
at
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:774)
at
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:615)
at
org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:561)
at
org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:842)
at
org.apache.geode.cache.client.internal.RegisterInterestListOp.execute(RegisterInterestListOp.java:47)
at
org.apache.geode.cache.client.internal.ServerRegionProxy.registerInterestList(ServerRegionProxy.java:533)
at
org.apache.geode.internal.cache.LocalRegion.processSingleInterest(LocalRegion.java:3763)
at
org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3848)
at
org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3842)
at
org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3837)
at TestClient.registerInterest(TestClient.java:94)
at TestClient.main(TestClient.java:31)
{noformat}
Its unclear why the registerInterest call times out, but the test sets the read
timeout = 2000.
{noformat}
p = PoolManager.createFactory().addServer(host, PORT1).addServer(host, PORT2)
.setSubscriptionEnabled(true).setSubscriptionRedundancy(-1).setReadTimeout(2000)
{noformat}
In the meantime, the CacheClientProxy has been established on each server, so
when the CacheClientUpdater is attempted to be recreated by the
RedundancySatisfierTask, it fails with:
{noformat}
[vm0] [warn 2018/01/09 15:49:13.932 PST <Client Queue Initialization Thread 0>
tid=0x5c] A previous connection attempt from this client is still being
processed: identity(192.168.2.6(20162:loner):58955:43e553dd,connection=1
[vm3] [warn 2018/01/09 15:49:13.932 PST
<queueTimer-RegisterInterestKeysDUnitTestPool> tid=0x3f] Cache Client Updater
Thread on 192.168.2.6(20159)<v1>:32770 port 27374 (localhost:27374): Caught
following exception while attempting to create a server-to-client communication
socket and will exit:
org.apache.geode.cache.client.ServerRefusedConnectionException: <null inet_addr
hostname><ec>:27374 refused connection: A previous connection attempt from this
client is still being processed:
identity(192.168.2.6(20162:loner):58955:43e553dd,connection=1
{noformat}
Here is a stack trace of the CacheClientUpdater recovery from the client:
{noformat}
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.<init>(CacheClientUpdater.java:273)
at
org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createServerToClientConnection(ConnectionFactoryImpl.java:302)
at
org.apache.geode.cache.client.internal.QueueManagerImpl.initializeQueueConnection(QueueManagerImpl.java:977)
at
org.apache.geode.cache.client.internal.QueueManagerImpl.createNewPrimary(QueueManagerImpl.java:816)
at
org.apache.geode.cache.client.internal.QueueManagerImpl.recoverPrimary(QueueManagerImpl.java:928)
at
org.apache.geode.cache.client.internal.QueueManagerImpl$RedundancySatisfierTask.run2(QueueManagerImpl.java:1469)
at
org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1326)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
Here is a stack trace of the CacheClientNotifier.registerClient method that
logs the 'previous connection attempt' warning:
{noformat}
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.registerClient(CacheClientNotifier.java:524)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.registerGFEClient(CacheClientNotifier.java:332)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.registerClient(CacheClientNotifier.java:279)
at
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$ClientQueueInitializerTask.run(AcceptorImpl.java:1844)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$3$1.run(AcceptorImpl.java:611)
at java.lang.Thread.run(Thread.java:745)
{noformat}
The current checks are:
- staleClientProxy.isConnected() (boolean on the CacheClientProxy that won't be
false unless CacheClientProxy close has been invoked)
- staleClientProxy.getSocket().isConnected() (will be true if the socket has
ever been connected even if it is closed now)
I tried staleClientProxy.getSocket().isClosed(), but that returned false even
though the socket had been closed on the remote side.
Since there is no activity on the CacheClientProxy's MessageDispatcher, it
doesn't realize its socket has been terminated from the remote side. The way
that happens is during the ClientHealthMonitor's periodic checks, it realizes
it hasn't gotten a ping from the client and it closes all the connections for
that client including the CacheClientProxy's MessageDispatcher.
Here is a stack trace of the ClientHealthMonitor terminating the
ServerConnection and CacheClientProxy:
{noformat}
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:891)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.close(CacheClientProxy.java:793)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.closeDeadProxies(CacheClientNotifier.java:1750)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.unregisterClient(CacheClientNotifier.java:725)
at
org.apache.geode.internal.cache.tier.sockets.ClientHealthMonitor.unregisterClient(ClientHealthMonitor.java:281)
at
org.apache.geode.internal.cache.tier.sockets.ServerConnection.handleTermination(ServerConnection.java:914)
at
org.apache.geode.internal.cache.tier.sockets.ClientHealthMonitor.validateThreads(ClientHealthMonitor.java:624)
at
org.apache.geode.internal.cache.tier.sockets.ClientHealthMonitor$ClientHealthMonitorThread.run(ClientHealthMonitor.java:848)
{noformat}
Questions:
1. Why is the CacheClientUpdater closed when the registerInterest
QueueConnectionImpl times out? Do they need to be maintained in pairs? Would it
be possible to recreate a connection on that same server?
2. Is it possible to use the ClientHealthMonitor to check the stat of the
CacheClientProxy? Is it worth doing this?
Normal operations (like puts) do not cause this same behavior. The
CacheClientUpdater is not closed in the case of a normal operation timing out.
> [CI Failure]Class
> org.apache.geode.internal.cache.tier.sockets.RegisterInterestKeysPRDUnitTest
> --- testRegisterCreatesInvalidEntry
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-4251
> URL: https://issues.apache.org/jira/browse/GEODE-4251
> Project: Geode
> Issue Type: Bug
> Components: cq
> Reporter: nabarun
>
> {noformat}
> org.apache.geode.test.dunit.RMIException: While invoking
> org.apache.geode.internal.cache.tier.sockets.RegisterInterestKeysDUnitTest$$Lambda$28/1575286050.run
> in VM 3 running on Host ea663fe20602 with 4 VMs
> at org.apache.geode.test.dunit.VM.invoke(VM.java:393)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:363)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:308)
> at
> org.apache.geode.internal.cache.tier.sockets.RegisterInterestKeysDUnitTest.testRegisterCreatesInvalidEntry(RegisterInterestKeysDUnitTest.java:135)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
> at
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
> at
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
> at
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> at
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
> at
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:147)
> at
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:129)
> at
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
> at
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
> at
> org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:46)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.AssertionError: failed while registering interest
> at org.apache.geode.test.dunit.Assert.fail(Assert.java:66)
> at
> org.apache.geode.internal.cache.tier.sockets.RegisterInterestKeysDUnitTest.registerKeysK1(RegisterInterestKeysDUnitTest.java:224)
> at
> org.apache.geode.internal.cache.tier.sockets.RegisterInterestKeysDUnitTest.lambda$testRegisterCreatesInvalidEntry$bb17a952$2(RegisterInterestKeysDUnitTest.java:135)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at hydra.MethExecutor.executeObject(MethExecutor.java:244)
> at
> org.apache.geode.test.dunit.standalone.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
> at sun.rmi.transport.Transport$1.run(Transport.java:200)
> at sun.rmi.transport.Transport$1.run(Transport.java:197)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> at
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
> ... 3 more
> Caused by: org.apache.geode.cache.NoSubscriptionServersAvailableException:
> org.apache.geode.cache.NoSubscriptionServersAvailableException: Primary
> discovery failed.
> at
> org.apache.geode.cache.client.internal.QueueManagerImpl.getAllConnections(QueueManagerImpl.java:190)
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnQueuesAndReturnPrimaryResult(OpExecutorImpl.java:558)
> at
> org.apache.geode.cache.client.internal.PoolImpl.executeOnQueuesAndReturnPrimaryResult(PoolImpl.java:842)
> at
> org.apache.geode.cache.client.internal.RegisterInterestListOp.execute(RegisterInterestListOp.java:47)
> at
> org.apache.geode.cache.client.internal.ServerRegionProxy.registerInterestList(ServerRegionProxy.java:533)
> at
> org.apache.geode.internal.cache.LocalRegion.processSingleInterest(LocalRegion.java:3761)
> at
> org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3846)
> at
> org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3840)
> at
> org.apache.geode.internal.cache.LocalRegion.registerInterest(LocalRegion.java:3835)
> at
> org.apache.geode.internal.cache.tier.sockets.RegisterInterestKeysDUnitTest.registerKeysK1(RegisterInterestKeysDUnitTest.java:213)
> ... 24 more
> Caused by: org.apache.geode.cache.NoSubscriptionServersAvailableException:
> Primary discovery failed.
> at
> org.apache.geode.cache.client.internal.QueueManagerImpl$ConnectionList.setPrimaryDiscoveryFailed(QueueManagerImpl.java:1351)
> at
> org.apache.geode.cache.client.internal.QueueManagerImpl.recoverPrimary(QueueManagerImpl.java:965)
> at
> org.apache.geode.cache.client.internal.QueueManagerImpl$RedundancySatisfierTask.run2(QueueManagerImpl.java:1467)
> at
> org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1326)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ... 3 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)