[ https://issues.apache.org/jira/browse/GEODE-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Donal Evans resolved GEODE-9310. -------------------------------- Resolution: Invalid With the removal of function execution in GEODE-9168, this issue is no longer valid, as the code described has been removed. > Geode redis should send error message when encounter > DistributedSystemDisconnectedException, ForcedDisconnectException or > FunctionInvocationTargetException > ----------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-9310 > URL: https://issues.apache.org/jira/browse/GEODE-9310 > Project: Geode > Issue Type: Bug > Components: redis > Reporter: Eric Shu > Priority: Major > > Currently when redis function execution encounter these exception, it will > just close the client connection. > {code:java} > else if (cause instanceof FunctionInvocationTargetException > || cause instanceof DistributedSystemDisconnectedException > || cause instanceof ForcedDisconnectException) { > // This indicates a member departed or got disconnected > logger.warn( > "Closing client connection because one of the servers doing this > operation departed."); > channelInactive(ctx); > response = null; > } > {code} > This is incorrect, as this could lead to client performing retries of the > same redis command. > The correct way is to send an error response that command may or may not have > been executed on servers, user need to verify and retry if necessary. > Here is a test result showing that current behavior of client retrying: > The node executing the ZREM command: > (vm1 gets the connection from JedisCluster client) > [vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> > tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 > member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 > member1-204 > (primary is on vm3 and it is execute the function) > [vm3] [warn 2021/05/25 15:31:26.692 PDT server-3 <Function Execution > Processor3> tid=0xf3] bucket region is > BucketRegion[path='/__PR/_B__REDIS__DATA_123;serial=399;primary=true] key key > [vm3] java.lang.Exception > [vm3] at > org.apache.geode.internal.cache.BucketRegion.virtualPut(BucketRegion.java:533) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5586) > [vm3] at > org.apache.geode.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1213) > [vm3] at > org.apache.geode.internal.cache.PartitionedRegion.putInBucket(PartitionedRegion.java:3024) > [vm3] at > org.apache.geode.internal.cache.PartitionedRegion.virtualPut(PartitionedRegion.java:2236) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5586) > [vm3] at > org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:157) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5044) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1645) > [vm3] at > org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1632) > [vm3] at > org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:445) > [vm3] at > org.apache.geode.redis.internal.data.AbstractRedisData.storeChanges(AbstractRedisData.java:218) > [vm3] at > org.apache.geode.redis.internal.data.RedisSortedSet.zrem(RedisSortedSet.java:239) > [vm3] at > org.apache.geode.redis.internal.data.RedisSortedSetCommandsFunctionExecutor.lambda$zrem$2(RedisSortedSetCommandsFunctionExecutor.java:50) > [vm3] at > org.apache.geode.redis.internal.executor.SynchronizedStripedExecutor.execute(SynchronizedStripedExecutor.java:47) > [vm3] at > org.apache.geode.redis.internal.data.RedisDataCommandsFunctionExecutor.stripedExecute(RedisDataCommandsFunctionExecutor.java:43) > [vm3] at > org.apache.geode.redis.internal.data.RedisSortedSetCommandsFunctionExecutor.zrem(RedisSortedSetCommandsFunctionExecutor.java:49) > [vm3] at > org.apache.geode.redis.internal.executor.CommandFunction.compute(CommandFunction.java:289) > [vm3] at > org.apache.geode.redis.internal.executor.SingleResultRedisFunction.lambda$execute$0(SingleResultRedisFunction.java:47) > [vm3] at > org.apache.geode.internal.cache.PartitionedRegion.computeWithPrimaryLocked(PartitionedRegion.java:629) > [vm3] at > org.apache.geode.redis.internal.executor.SingleResultRedisFunction.execute(SingleResultRedisFunction.java:51) > [vm3] at > org.apache.geode.internal.cache.PartitionedRegionDataStore.executeOnDataStore(PartitionedRegionDataStore.java:2988) > [vm3] at > org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage.operateOnPartitionedRegion(PartitionedRegionFunctionStreamingMessage.java:101) > [vm3] at > org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:333) > [vm3] at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376) > [vm3] at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441) > The operation has been distributed to the bucket with redundant copy. > [vm1] [warn 2021/05/25 15:31:26.694 PDT server-1 <P2P message reader for > 192.168.0.14(server-3:79774)<v3>:41003 shared ordered sender uid=19 local > port=54410 remote port=51127> tid=0xf8] bucket region is > BucketRegion[path='/__PR/_B__REDIS__DATA_123;serial=399;primary=false] key key > [vm1] java.lang.Exception > [vm1] at > org.apache.geode.internal.cache.BucketRegion.virtualPut(BucketRegion.java:533) > [vm1] at > org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:171) > [vm1] at > org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5579) > [vm1] at > org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:156) > [vm1] at > org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:307) > [vm1] at > org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:278) > [vm1] at > org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1213) > [vm1] at > org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1115) > [vm1] at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376) > [vm1] at > org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:432) > [vm1] at > org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2066) > [vm1] at > org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:1831) > [vm1] at > org.apache.geode.distributed.internal.membership.gms.GMSMembership.dispatchMessage(GMSMembership.java:847) > [vm1] at > org.apache.geode.distributed.internal.membership.gms.GMSMembership.handleOrDeferMessage(GMSMembership.java:804) > [vm1] at > org.apache.geode.distributed.internal.membership.gms.GMSMembership.processMessage(GMSMembership.java:1098) > [vm1] at > org.apache.geode.distributed.internal.DistributionImpl$MyDCReceiver.messageReceived(DistributionImpl.java:828) > [vm1] at > org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:622) > [vm1] at > org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:682) > [vm1] at > org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3271) > [vm1] at > org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3022) > [vm1] at > org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2809) > [vm1] at > org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1647) > [vm1] at > org.apache.geode.internal.tcp.Connection.run(Connection.java:1487) > [vm1] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [vm1] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [vm1] at java.lang.Thread.run(Thread.java:748) > [vm1] [warn 2021/05/25 15:31:26.714 PDT server-1 <P2P message reader for > 192.168.0.14(server-3:79774)<v3>:41003 shared ordered sender uid=19 local > port=54410 remote port=51127> tid=0xf8] membersRemoveAll invoked > [vm1] java.lang.Exception > [vm1] at > org.apache.geode.redis.internal.data.RedisSortedSet.membersRemoveAll(RedisSortedSet.java:175) > [vm1] at > org.apache.geode.redis.internal.data.RedisSortedSet.applyDelta(RedisSortedSet.java:89) > [vm1] at > org.apache.geode.redis.internal.data.AbstractRedisData.fromDelta(AbstractRedisData.java:193) > [vm1] at > org.apache.geode.internal.cache.EntryEventImpl.processDeltaBytes(EntryEventImpl.java:1841) > [vm1] at > org.apache.geode.internal.cache.EntryEventImpl.setNewValueInRegion(EntryEventImpl.java:1696) > [vm1] at > org.apache.geode.internal.cache.EntryEventImpl.putExistingEntry(EntryEventImpl.java:1643) > [vm1] at > org.apache.geode.internal.cache.map.RegionMapPut.updateEntry(RegionMapPut.java:485) > (vm3 is shutdown) > [vm3] [info 2021/05/25 15:31:26.704 PDT server-3 <RMI TCP > Connection(1)-192.168.0.14> tid=0x14] GemFireCache[id = 2015401929; isClosing > = true; isShutDownAll = false; created = Tue May 25 15:31:24 PDT 2021; server > = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing. > [vm3] [info 2021/05/25 15:31:26.706 PDT server-3 <RMI TCP > Connection(1)-192.168.0.14> tid=0x14] GeodeRedisServer shutting down > vm1 reports command failed (even though it succeeded and was distributed to > replicas) > [vm1] [warn 2021/05/25 15:31:26.728 PDT server-1 <GeodeRedisServer-Command-1> > tid=0xfb] Execution of Redis command ZREM key member1-212 member1-211 > member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 > member1-204 > failed: org.apache.geode.cache.execute.FunctionException: > org.apache.geode.cache.execute.FunctionInvocationTargetException: The cache > is closed. > [vm1] [warn 2021/05/25 15:31:26.729 PDT server-1 <GeodeRedisServer-Command-1> > tid=0xfb] Closing client connection because one of the servers doing this > operation departed. > This leads to JedisCluster client to retry with the same command: > [vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> > tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 > member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 > member1-204 > [vm1] [warn 2021/05/25 15:31:26.729 PDT server-1 <GeodeRedisServer-Command-1> > tid=0xfb] Closing client connection because one of the servers doing this > operation departed. > [vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> > tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 > member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 > member1-204 > ... > [vm1] [warn 2021/05/25 15:31:26.737 PDT server-1 <GeodeRedisServer-Command-1> > tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 > member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 > member1-204 > ... > [vm1] [warn 2021/05/25 15:31:26.739 PDT server-1 <GeodeRedisServer-Command-1> > tid=0xfb] Execution of Redis command ZREM key member1-212 member1-211 > member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 > member1-204 failed: org.apache.geode.cache.execute.FunctionException: > org.apache.geode.cache.execute.FunctionInvocationTargetException: Remote > cache is closed: 192.168.0.14(server-3:79774)<v3>:41003 > ... > Currently the retries aborted due to the following issue by JedisCluster > client (which is tracked by redis issue of not back off and retry on > different node: https://github.com/redis/jedis/issues/2347) > java.util.concurrent.ExecutionException: > redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more > cluster attempts left. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.zRemCanRemoveMembersFromSortedSetDuringPrimaryIsCrashed(ZRemDUnitTest.java:177) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:54) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > Caused by: redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: > No more cluster attempts left. > at > redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:84) > at > redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) > at > redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) > at > redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) > at > redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) > at > redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) > at > redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:25) > at redis.clients.jedis.JedisCluster.zrem(JedisCluster.java:984) > at > org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.doZRem(ZRemDUnitTest.java:157) > at > org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.lambda$zRemCanRemoveMembersFromSortedSetDuringPrimaryIsCrashed$0(ZRemDUnitTest.java:174) > at > org.apache.geode.test.junit.rules.ExecutorServiceRule.lambda$submit$2(ExecutorServiceRule.java:232) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Once this is solved by JedisCluter client, the retry may cause wrong number > being reported by the retried command. > It is possible that this behavior may impact on APPEND and INCRBY command -- > which would produce wrong results of the retried command (subject to > additional testing). -- This message was sent by Atlassian Jira (v8.3.4#803005)