[
https://issues.apache.org/jira/browse/GEODE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen Nichols updated GEODE-7763:
--------------------------------
Fix Version/s: 1.13.0
> Apache Geode 1.11 severely and negatively impacts performance and resource
> utilization
> --------------------------------------------------------------------------------------
>
> Key: GEODE-7763
> URL: https://issues.apache.org/jira/browse/GEODE-7763
> Project: Geode
> Issue Type: Bug
> Components: client/server
> Affects Versions: 1.10.0, 1.11.0
> Reporter: John Blum
> Assignee: Ernest Burghardt
> Priority: Critical
> Labels: performance
> Fix For: 1.12.0, 1.13.0
>
> Attachments: 1.11-client-stats.gfs, 1.11-server-stats.gfs,
> 1.11_thread_dumps.rtf, 1.9-client-stats.gfs, 1.9-server-stats.gfs, 1.9.log,
> apache-geode-1.10-client-server-interaction-output.txt,
> apache-geode-1.10-client-server-startup-output.txt,
> apache-geode-1.11-client-server-interaction-output.txt,
> apache-geode-1.11-client-server-startup-output.txt,
> geode-7763-geode-changes.diff, geode-7763-ssdg-changes.diff
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> This problem was first observed in Apache Geode 1.11.0. The problem was not
> present in Apache Geode 1.9.2. This problem is an issue for Apache Geode
> 1.10 as well!
> After upgrading _Spring Session for Apache Geode_ (SSDG) 2.3 to _Spring Data
> for Apache Geode_ (SDG) Neumann/2.3, which is based on Apache Geode 1.11,
> this problem with SSDG's test suite started occurring.
> _Spring Session for Apache Geode_ (SSDG) 2.2, which is based on _Spring Data
> for Apache Geode_ (SDG) Moore/2.2, pulls in Apache Geode 1.9.2. This problem
> did not occur in SSDG 2.2. with Apache Geode 1.9.2.
> Out of curiosity, I wondered whether this problem affects (i.e. was actually
> introduced in) Apache Geode 1.10.0. So, I configured SSDG 2.3 to pull in SDG
> Moore/2.2 but run with Apache Geode 1.10. The problem occurred with Apache
> Geode 1.10 as well!
> The SSDG test class in question, affected by Geode's deficiencies, is the
> [MultiThreadedHighlyConcurrentClientServerSessionOperationsIntegrationTests|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java].
> The test class was modeled after a customer UC, who were using Spring Session
> and Apache Geode/Pivotal GemFire as the HTTP Session state management
> provider, therefore it simulates their highly concurrent environment.
> The test class has 2 primary parameters: [Thread
> Count|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90]
> and the [Workload
> Size|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L91].
> The "_Workload Size_" should not be confused with the "_Payload Size_" of the
> individual objects passed to the Geode data access operations (i.e. {{gets}},
> {{puts}}, {{removes}}). The "_Workload Size_" merely determines the number
> of {{get}}, {{put}} or {{remove}} operations performed on the (Session)
> Region over the duration of the test run. Certain operations are "favored"
> over others, therefore the number of {{gets}}, {{puts}} and {{removes}} is
> weighted.
> The "_Payload_" in this case is a (HTTP) {{Session}} object and the "size" is
> directly proportional to the number of Session attributes stored in the
> Session.
> As you can see from the [test class
> configuration|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90-L91]
> in *SSDG* {{2.2}}, the *Thread Count* was set to *180* and the *Workload
> Size* (or number of Region operations) was set to *10,000*.
> This had to be significantly adjusted in SSDG 2.3 using Apache Geode 1.11
> (and, as it turns out, Apache Geode 1.10 as well), as can be seen in the
> {{2.3.0.M1}} release bits source,
> [here|https://github.com/spring-projects/spring-session-data-geode/blob/2.3.0.M1/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L94-L95].
> It turns out different combinations of the Thread Count (number of workers,
> or "concurrent Sessions") and Workload Size ultimately determine whether this
> test class passes or not.
> In other words, if I increase the Thread Count, then the Workload Size must
> decrease, otherwise the test fails! If I increase the Workload Size, then
> the Thread Count must decrease, otherwise again the test fails!
> I tried with different combinations of Thread Count and Workload Size until
> the test passed. More often than not 180 Threads with 3000 Regions
> operations worked, but was right on the cusp of failing, therefore, I settled
> on 180 Threads (which nearly matches the customers environment of 200
> concurrent client Sessions) and 2000 concurrent Region operations.
> The point of the test class is to assert the state of the Session is
> consistent at the end of the test run.
> However, before this test can even finish, the client, as in the
> {{ClientCache}} instance, starts failing with Exceptions, specifically:
> {code:java}
> java.lang.RuntimeException: Session Access Task Failed
> at
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:298)
> at
> java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:210)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.IntPipeline.reduce(IntPipeline.java:456)
> at java.util.stream.IntPipeline.sum(IntPipeline.java:414)
> at
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.runSessionWorkload(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:313)
> at
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.concurrentSessionAccessIsCorrect(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:324)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
> at
> org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
> at
> org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
> at
> org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
> at
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at
> org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
> at
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
> at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: java.util.concurrent.ExecutionException:
> org.springframework.dao.DataAccessResourceFailureException: Pool unexpected
> socket timed out on client connection=Pooled Connection to localhost:60964:
> Connection[DESTROYED]). Server unreachable: could not connect after 1
> attempts; nested exception is
> org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected
> socket timed out on client connection=Pooled Connection to localhost:60964:
> Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:295)
> ... 43 more
> Caused by: org.springframework.dao.DataAccessResourceFailureException: Pool
> unexpected socket timed out on client connection=Pooled Connection to
> localhost:60964: Connection[DESTROYED]). Server unreachable: could not
> connect after 1 attempts; nested exception is
> org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected
> socket timed out on client connection=Pooled Connection to localhost:60964:
> Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
> at
> org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:235)
> at
> org.springframework.data.gemfire.GemfireAccessor.convertGemFireAccessException(GemfireAccessor.java:93)
> at
> org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:200)
> at
> org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.doSave(GemFireOperationsSessionRepository.java:226)
> at
> org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:186)
> at
> org.springframework.session.data.gemfire.AbstractGemFireIntegrationTests.save(AbstractGemFireIntegrationTests.java:409)
> at
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.lambda$newAddSessionAttributeTask$2(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:216)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.geode.cache.client.ServerConnectivityException: Pool
> unexpected socket timed out on client connection=Pooled Connection to
> localhost:60964: Connection[DESTROYED]). Server unreachable: could not
> connect after 1 attempts
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659)
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501)
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:153)
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
> at
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
> at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:89)
> at
> org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:159)
> at
> org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3035)
> at
> org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3152)
> at
> org.apache.geode.internal.cache.ProxyRegionMap.basicPut(ProxyRegionMap.java:238)
> at
> org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5580)
> at
> org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
> at
> org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5036)
> at
> org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1635)
> at
> org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1622)
> at
> org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:442)
> at
> org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:197)
> ... 8 more
> {code}
> Attached to this issue are log output files from each of my runs using Apache
> Geode 1.10 and 1.11.
> The log files serve 2 purposes: 1) to show the version of Apache Geode used
> and 2) the errors occurs on the client and server during the run.
> Any lines in the log output prefixed with "{{[FORK]}}" originates from the
> cache server. The other lines come from the client. There is only a single
> client and server in this test case.
> It takes a bit of initial time during the run for the failures to start
> occurring, which is why this seems like a resource utilization problem.
> After first, I suspected issues with the client Pool configuration, or
> {{CacheServer}} configuration, adjusting timeouts and so on. I even
> suspected memory being an issue for the client and server processes, upping
> each to 2 GB+. However, it turns out none of the changes made a bit of
> difference. And the truth of the matter is, this (existing) configuration
> worked seamlessly until I upgraded to Apache Geode 1.10+ (specifically, 1.11).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)