[ https://issues.apache.org/jira/browse/GEODE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062079#comment-17062079 ]
ASF subversion and git services commented on GEODE-7763: -------------------------------------------------------- Commit 550cb2edf3778d6151a07d6248dbdc42cd9a1b18 in geode's branch refs/heads/release/1.12.0 from Jason Huynh [ https://gitbox.apache.org/repos/asf?p=geode.git;h=550cb2e ] GEODE-7763: Do not exportValue if no clients are waiting for result (#4771) * Avoid doing unnecessary serialization (cherry picked from commit b4c3e9413f0008635b0a0c9116c96147d1e4ceec) > Apache Geode 1.11 severely and negatively impacts performance and resource > utilization > -------------------------------------------------------------------------------------- > > Key: GEODE-7763 > URL: https://issues.apache.org/jira/browse/GEODE-7763 > Project: Geode > Issue Type: Bug > Components: client/server > Affects Versions: 1.10.0, 1.11.0 > Reporter: John Blum > Priority: Critical > Labels: performance > Attachments: 1.11-client-stats.gfs, 1.11-server-stats.gfs, > 1.11_thread_dumps.rtf, 1.9-client-stats.gfs, 1.9-server-stats.gfs, 1.9.log, > apache-geode-1.10-client-server-interaction-output.txt, > apache-geode-1.10-client-server-startup-output.txt, > apache-geode-1.11-client-server-interaction-output.txt, > apache-geode-1.11-client-server-startup-output.txt, > geode-7763-geode-changes.diff, geode-7763-ssdg-changes.diff > > Time Spent: 40m > Remaining Estimate: 0h > > This problem was first observed in Apache Geode 1.11.0. The problem was not > present in Apache Geode 1.9.2. This problem is an issue for Apache Geode > 1.10 as well! > After upgrading _Spring Session for Apache Geode_ (SSDG) 2.3 to _Spring Data > for Apache Geode_ (SDG) Neumann/2.3, which is based on Apache Geode 1.11, > this problem with SSDG's test suite started occurring. > _Spring Session for Apache Geode_ (SSDG) 2.2, which is based on _Spring Data > for Apache Geode_ (SDG) Moore/2.2, pulls in Apache Geode 1.9.2. This problem > did not occur in SSDG 2.2. with Apache Geode 1.9.2. > Out of curiosity, I wondered whether this problem affects (i.e. was actually > introduced in) Apache Geode 1.10.0. So, I configured SSDG 2.3 to pull in SDG > Moore/2.2 but run with Apache Geode 1.10. The problem occurred with Apache > Geode 1.10 as well! > The SSDG test class in question, affected by Geode's deficiencies, is the > [MultiThreadedHighlyConcurrentClientServerSessionOperationsIntegrationTests|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java]. > The test class was modeled after a customer UC, who were using Spring Session > and Apache Geode/Pivotal GemFire as the HTTP Session state management > provider, therefore it simulates their highly concurrent environment. > The test class has 2 primary parameters: [Thread > Count|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90] > and the [Workload > Size|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L91]. > The "_Workload Size_" should not be confused with the "_Payload Size_" of the > individual objects passed to the Geode data access operations (i.e. {{gets}}, > {{puts}}, {{removes}}). The "_Workload Size_" merely determines the number > of {{get}}, {{put}} or {{remove}} operations performed on the (Session) > Region over the duration of the test run. Certain operations are "favored" > over others, therefore the number of {{gets}}, {{puts}} and {{removes}} is > weighted. > The "_Payload_" in this case is a (HTTP) {{Session}} object and the "size" is > directly proportional to the number of Session attributes stored in the > Session. > As you can see from the [test class > configuration|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90-L91] > in *SSDG* {{2.2}}, the *Thread Count* was set to *180* and the *Workload > Size* (or number of Region operations) was set to *10,000*. > This had to be significantly adjusted in SSDG 2.3 using Apache Geode 1.11 > (and, as it turns out, Apache Geode 1.10 as well), as can be seen in the > {{2.3.0.M1}} release bits source, > [here|https://github.com/spring-projects/spring-session-data-geode/blob/2.3.0.M1/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L94-L95]. > It turns out different combinations of the Thread Count (number of workers, > or "concurrent Sessions") and Workload Size ultimately determine whether this > test class passes or not. > In other words, if I increase the Thread Count, then the Workload Size must > decrease, otherwise the test fails! If I increase the Workload Size, then > the Thread Count must decrease, otherwise again the test fails! > I tried with different combinations of Thread Count and Workload Size until > the test passed. More often than not 180 Threads with 3000 Regions > operations worked, but was right on the cusp of failing, therefore, I settled > on 180 Threads (which nearly matches the customers environment of 200 > concurrent client Sessions) and 2000 concurrent Region operations. > The point of the test class is to assert the state of the Session is > consistent at the end of the test run. > However, before this test can even finish, the client, as in the > {{ClientCache}} instance, starts failing with Exceptions, specifically: > {code:java} > java.lang.RuntimeException: Session Access Task Failed > at > org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:298) > at > java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:210) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.IntPipeline.reduce(IntPipeline.java:456) > at java.util.stream.IntPipeline.sum(IntPipeline.java:414) > at > org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.runSessionWorkload(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:313) > at > org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.concurrentSessionAccessIsCorrect(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:324) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74) > at > org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75) > at > org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86) > at > org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251) > at > org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) > Caused by: java.util.concurrent.ExecutionException: > org.springframework.dao.DataAccessResourceFailureException: Pool unexpected > socket timed out on client connection=Pooled Connection to localhost:60964: > Connection[DESTROYED]). Server unreachable: could not connect after 1 > attempts; nested exception is > org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected > socket timed out on client connection=Pooled Connection to localhost:60964: > Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:295) > ... 43 more > Caused by: org.springframework.dao.DataAccessResourceFailureException: Pool > unexpected socket timed out on client connection=Pooled Connection to > localhost:60964: Connection[DESTROYED]). Server unreachable: could not > connect after 1 attempts; nested exception is > org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected > socket timed out on client connection=Pooled Connection to localhost:60964: > Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts > at > org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:235) > at > org.springframework.data.gemfire.GemfireAccessor.convertGemFireAccessException(GemfireAccessor.java:93) > at > org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:200) > at > org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.doSave(GemFireOperationsSessionRepository.java:226) > at > org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:186) > at > org.springframework.session.data.gemfire.AbstractGemFireIntegrationTests.save(AbstractGemFireIntegrationTests.java:409) > at > org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.lambda$newAddSessionAttributeTask$2(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:216) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.geode.cache.client.ServerConnectivityException: Pool > unexpected socket timed out on client connection=Pooled Connection to > localhost:60964: Connection[DESTROYED]). Server unreachable: could not > connect after 1 attempts > at > org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:153) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108) > at > org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772) > at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:89) > at > org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:159) > at > org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3035) > at > org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3152) > at > org.apache.geode.internal.cache.ProxyRegionMap.basicPut(ProxyRegionMap.java:238) > at > org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5580) > at > org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162) > at > org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5036) > at > org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1635) > at > org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1622) > at > org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:442) > at > org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:197) > ... 8 more > {code} > Attached to this issue are log output files from each of my runs using Apache > Geode 1.10 and 1.11. > The log files serve 2 purposes: 1) to show the version of Apache Geode used > and 2) the errors occurs on the client and server during the run. > Any lines in the log output prefixed with "{{[FORK]}}" originates from the > cache server. The other lines come from the client. There is only a single > client and server in this test case. > It takes a bit of initial time during the run for the failures to start > occurring, which is why this seems like a resource utilization problem. > After first, I suspected issues with the client Pool configuration, or > {{CacheServer}} configuration, adjusting timeouts and so on. I even > suspected memory being an issue for the client and server processes, upping > each to 2 GB+. However, it turns out none of the changes made a bit of > difference. And the truth of the matter is, this (existing) configuration > worked seamlessly until I upgraded to Apache Geode 1.10+ (specifically, 1.11). -- This message was sent by Atlassian Jira (v8.3.4#803005)