[jira] [Commented] (GEODE-8655) Not handling exception on SNIHostName
[ https://issues.apache.org/jira/browse/GEODE-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225992#comment-17225992 ] ASF GitHub Bot commented on GEODE-8655: --- mkevo commented on pull request #5669: URL: https://github.com/apache/geode/pull/5669#issuecomment-721667656 Hi @bschuchardt , I think that this will not also work on your laptop too, you just need to add ipv6 address to you machine and follow steps described in the ticket. For now, I think it will be good to check if ipv6 is used, if yes ignore for it, until someone do this huge change in gfsh/LocatorLauncher as you mentioned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Not handling exception on SNIHostName > - > > Key: GEODE-8655 > URL: https://issues.apache.org/jira/browse/GEODE-8655 > Project: Geode > Issue Type: Bug > Components: locator, security >Affects Versions: 1.13.0 >Reporter: Mario Kevo >Assignee: Mario Kevo >Priority: Major > Labels: pull-request-available > > If we start locator with ipv6 and TLS enabled we got following error for > status locator command: > > {quote}mkevo@mkevo-XPS-15-9570:~/apache-geode-1.13.0/bin/locator$ > _/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -server -classpath > /home/mkevo/apache-geode-1.13.0/lib/geode-core-1.13.0.jar:/home/mkevo/apache-geode-1.13.0/lib/geode-dependencies.jar > -Djava.net.preferIPv6Addresses=true > -DgemfireSecurityPropertyFile=/home/mkevo/geode-examples/clientSecurity/example_security.properties > -Dgemfire.enable-cluster-configuration=true > -Dgemfire.load-cluster-configuration-from-dir=false > -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true > -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 > org.apache.geode.distributed.LocatorLauncher start locator --port=10334_ > > gfsh>_status locator --dir=/home/mkevo/apache-geode-1.13.0/bin/locator > --security-properties-file=/home/mkevo/geode-examples/clientSecurity/example_security.properties_ > *Locator in /home/mkevo/apache-geode-1.13.0/bin/locator on > mkevo-XPS-15-9570[10334] is currently not responding.* > {quote} > > From locator logs we found only this: > {quote}Exception in processing request from fe80:0:0:0:f83e:ce0f:5143:f9ee%2: > Read timed out > {quote} > > After adding some logs we found the following: > {quote}{color:#1d1c1d}TcpClient.stop(): exception connecting to locator > HostAndPort[/0:0:0:0:0:0:0:0:10334]java.lang.IllegalArgumentException: > Contains non-LDH ASCII characters{color} > {quote} > ** > It fails on creating SNIHostName from hostName(_setServerNames_ in > SocketCreator.java) as it not handling above exception. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down
[ https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226017#comment-17226017 ] ASF GitHub Bot commented on GEODE-8547: --- mivanac commented on pull request #5567: URL: https://github.com/apache/geode/pull/5567#issuecomment-721699428 Thanks for the comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Command "show missing-disk-stores" not working, when all servers are down > - > > Key: GEODE-8547 > URL: https://issues.apache.org/jira/browse/GEODE-8547 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Mario Ivanac >Assignee: Mario Ivanac >Priority: Major > Labels: needs-review, pull-request-available > > If cluster with 2 locators and 2 servers was ungracefully shutdown it can > happen that locators that are able to start up are not having most recent > data to bring up Cluster Configuration Service. > If we excute command "show missing-disk-stores" it will not work, since all > servers are down, > so we are stuck in this situation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down
[ https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226095#comment-17226095 ] ASF GitHub Bot commented on GEODE-8547: --- jinmeiliao commented on a change in pull request #5567: URL: https://github.com/apache/geode/pull/5567#discussion_r517420791 ## File path: geode-gfsh/src/main/java/org/apache/geode/management/internal/cli/commands/ShowMissingDiskStoreCommand.java ## @@ -95,7 +96,8 @@ private ResultModel toMissingDiskStoresTabularResult( ResultModel result = new ResultModel(); boolean hasMissingDiskStores = missingDiskStores.length != 0; -boolean hasMissingColocatedRegions = !missingColocatedRegions.isEmpty(); +boolean hasMissingColocatedRegions = Review comment: this variable is not used anymore. ## File path: geode-gfsh/src/main/java/org/apache/geode/management/internal/cli/commands/ShowMissingDiskStoreCommand.java ## @@ -95,7 +96,8 @@ private ResultModel toMissingDiskStoresTabularResult( ResultModel result = new ResultModel(); boolean hasMissingDiskStores = missingDiskStores.length != 0; Review comment: actually you can get rid of this variable and inline this too, to make it symetric. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Command "show missing-disk-stores" not working, when all servers are down > - > > Key: GEODE-8547 > URL: https://issues.apache.org/jira/browse/GEODE-8547 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Mario Ivanac >Assignee: Mario Ivanac >Priority: Major > Labels: needs-review, pull-request-available > > If cluster with 2 locators and 2 servers was ungracefully shutdown it can > happen that locators that are able to start up are not having most recent > data to bring up Cluster Configuration Service. > If we excute command "show missing-disk-stores" it will not work, since all > servers are down, > so we are stuck in this situation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8683) maximum-time-between-pings parameter in GatewayReceiver creation does not have any effect
[ https://issues.apache.org/jira/browse/GEODE-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-8683: -- Labels: pull-request-available (was: ) > maximum-time-between-pings parameter in GatewayReceiver creation does not > have any effect > - > > Key: GEODE-8683 > URL: https://issues.apache.org/jira/browse/GEODE-8683 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Alberto Gomez >Assignee: Alberto Gomez >Priority: Major > Labels: pull-request-available > > The maximum-time-between-pings parameter than can be set at gateway sender > creation has no effect, i.e. the value used as maximum time between pings for > gateway sender connections to the gateway receiver is either the default > value (6) or the one set on the server where the receiver is running. > The reason is that the ClientHealthMonitor is a server-side singleton that > monitors the health of all clients. The value set for this parameter in the > ClientHealthMonitor is first set when the server is started and the first > Acceptor is created. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8683) maximum-time-between-pings parameter in GatewayReceiver creation does not have any effect
[ https://issues.apache.org/jira/browse/GEODE-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226246#comment-17226246 ] ASF GitHub Bot commented on GEODE-8683: --- albertogpz opened a new pull request #5701: URL: https://github.com/apache/geode/pull/5701 The maximum-time-between-pings set when creating a gateway receiver was not honored because the ClientHealthMonitor, that is the singleton class monitoring all clients, supported just one value for maximum time between pings for all clients. This value is set when the server in which the receiver is running is started and when the gateway receiver provides a different value it is ignored. With this fix, it is allowed to have different values for maximum-time-between-clients for different clients. Thank you for submitting a contribution to Apache Geode. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [X] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [X] Has your PR been rebased against the latest commit within the target branch (typically `develop`)? - [X] Is your initial contribution a single, squashed commit? - [X] Does `gradlew build` run cleanly? - [X] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? ### Note: Please ensure that once the PR is submitted, check Concourse for build issues and submit an update to your PR as soon as possible. If you need help, please send an email to d...@geode.apache.org. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > maximum-time-between-pings parameter in GatewayReceiver creation does not > have any effect > - > > Key: GEODE-8683 > URL: https://issues.apache.org/jira/browse/GEODE-8683 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Alberto Gomez >Assignee: Alberto Gomez >Priority: Major > > The maximum-time-between-pings parameter than can be set at gateway sender > creation has no effect, i.e. the value used as maximum time between pings for > gateway sender connections to the gateway receiver is either the default > value (6) or the one set on the server where the receiver is running. > The reason is that the ClientHealthMonitor is a server-side singleton that > monitors the health of all clients. The value set for this parameter in the > ClientHealthMonitor is first set when the server is started and the first > Acceptor is created. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226272#comment-17226272 ] ASF GitHub Bot commented on GEODE-8647: --- pdxcodemonkey commented on a change in pull request #682: URL: https://github.com/apache/geode-native/pull/682#discussion_r517490303 ## File path: clicache/src/DataInput.hpp ## @@ -663,7 +664,7 @@ namespace Apache m_buffer = const_cast(nativeptr->currentBufferPosition()); if ( m_buffer != NULL) { m_bufferLength = static_cast(nativeptr->getBytesRemaining()); - } + } Review comment: Appears to be okay now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226273#comment-17226273 ] ASF GitHub Bot commented on GEODE-8647: --- pdxcodemonkey commented on a change in pull request #682: URL: https://github.com/apache/geode-native/pull/682#discussion_r517493689 ## File path: clicache/src/DataInput.cpp ## @@ -878,8 +877,6 @@ namespace Apache void DataInput::Cleanup() { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled
[ https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226297#comment-17226297 ] ASF GitHub Bot commented on GEODE-8681: --- bschuchardt merged pull request #5699: URL: https://github.com/apache/geode/pull/5699 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > peer-to-peer message loss due to sending connection closing with TLS enabled > > > Key: GEODE-8681 > URL: https://issues.apache.org/jira/browse/GEODE-8681 > Project: Geode > Issue Type: Bug > Components: membership, messaging >Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0 >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available, release-blocker > > We have observed message loss when TLS is enabled and a distributed lock is > released right after sending a message that doesn't require acknowledgement > if the sending socket is immediately closed. The closing of sockets > immediately after sending a message is frequently seen in function execution > threads or server-side application threads that use this pattern: > {code:java} > try { > DistributedSystem.setThreadsSocketPolicy(false); > acquireDistributedLock(lockName); > (perform one or more cache operations) > } finally { > distLockService.unlock(lockName); > DistributedSystem.releaseThreadsSockets(); // closes the socket > } > {code} > The fault seems to be in NioSSLEngine.unwrap(), which throws an > SSLException() if it finds the SSLEngine is closed even though there is valid > data in its decrypt buffer. It shouldn't throw an exception in that case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled
[ https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226304#comment-17226304 ] ASF subversion and git services commented on GEODE-8681: Commit 7da8f9b516ac1e2525a1dfc922af7bfb8995f2c6 in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=7da8f9b ] GEODE-8681: peer-to-peer message loss due to sending connection closing with TLS enabled (#5699) A socket-read could pick up more than one message and a single unwrap() could decrypt multiple messages. Normally the engine isn't closed and it reports normal status from an unwrap() operation, and Connection.processInputBuffer picks up each message, one by one, from the buffer and dispatches them. But if the SSLEngine is closed we were ignoring any already-decrypted data sitting in the unwrapped buffer and instead we were throwing an SSLException. > peer-to-peer message loss due to sending connection closing with TLS enabled > > > Key: GEODE-8681 > URL: https://issues.apache.org/jira/browse/GEODE-8681 > Project: Geode > Issue Type: Bug > Components: membership, messaging >Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0 >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available, release-blocker > > We have observed message loss when TLS is enabled and a distributed lock is > released right after sending a message that doesn't require acknowledgement > if the sending socket is immediately closed. The closing of sockets > immediately after sending a message is frequently seen in function execution > threads or server-side application threads that use this pattern: > {code:java} > try { > DistributedSystem.setThreadsSocketPolicy(false); > acquireDistributedLock(lockName); > (perform one or more cache operations) > } finally { > distLockService.unlock(lockName); > DistributedSystem.releaseThreadsSockets(); // closes the socket > } > {code} > The fault seems to be in NioSSLEngine.unwrap(), which throws an > SSLException() if it finds the SSLEngine is closed even though there is valid > data in its decrypt buffer. It shouldn't throw an exception in that case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226308#comment-17226308 ] ASF GitHub Bot commented on GEODE-8647: --- lgtm-com[bot] commented on pull request #682: URL: https://github.com/apache/geode-native/pull/682#issuecomment-721882183 This pull request **introduces 4 alerts** when merging daac2c91545b9c8cb10d729e741658eb463deac2 into 0d9a99d5e0632de62df17921950cf3f6640efb33 - [view on LGTM.com](https://lgtm.com/projects/g/apache/geode-native/rev/pr-60291afa9c24d4b295d3b0886ebe8f339141f43c) **new alerts:** * 2 for Call to GC\.Collect\(\) * 2 for Useless assignment to local variable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8546) Colocated regions missing some buckets after restart
[ https://issues.apache.org/jira/browse/GEODE-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226309#comment-17226309 ] ASF GitHub Bot commented on GEODE-8546: --- DonalEvans commented on pull request #5590: URL: https://github.com/apache/geode/pull/5590#issuecomment-721882594 > @DonalEvans Do you have any capacity to assist with this? I emailed back and forth with Mario a bit trying to figure out a way to reproduce the issue in a DUnit test, but it's not an area of the code I'm particularly familiar with so my ability to help was fairly limited. I thought that if the assumption that the problem was caused by colocation taking too long was correct, then artificially slowing down colocation by using a listener to wait a little after each bucket is created could reproduce it. Since part of the proposed fix seems to be to wait a total of 9000ms for colocation to complete in `CreateMIssingBucketTask.java`, it seems like forcing colocation to take close to that long should be a sure-fire way to reproduce the issue, if the fix is effective. It seems like this wasn't the case though, so maybe my idea was off the mark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Colocated regions missing some buckets after restart > > > Key: GEODE-8546 > URL: https://issues.apache.org/jira/browse/GEODE-8546 > Project: Geode > Issue Type: Bug > Components: regions >Affects Versions: 1.11.0, 1.12.0, 1.13.0 >Reporter: Mario Kevo >Assignee: Mario Kevo >Priority: Major > Labels: pull-request-available > > After restart all servers at the same time, some colocation regions missing > some buckets. > This issue exist for a long time and become visible from 1.11.0 by > introducing this changes GEODE-7042 . > How to reproduce the issue: > # Start two locators and two servers > # Create PARTITION_REDUNDANT_PERSISTENT region with redundant-copies=1 > # Create few PARTITION_REDUNDANT regions(I used six regions) colocated with > persistent region and redundant-copies=1 > # Put some entries. > # Restart servers(you can simply run "kill -15 " and then from > two terminals start both of them at the same time) > # It will take a time to get server startup finished and for the latest > region bucketCount will be lower than expected on one member -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled
[ https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226310#comment-17226310 ] ASF subversion and git services commented on GEODE-8681: Commit 7da8f9b516ac1e2525a1dfc922af7bfb8995f2c6 in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=7da8f9b ] GEODE-8681: peer-to-peer message loss due to sending connection closing with TLS enabled (#5699) A socket-read could pick up more than one message and a single unwrap() could decrypt multiple messages. Normally the engine isn't closed and it reports normal status from an unwrap() operation, and Connection.processInputBuffer picks up each message, one by one, from the buffer and dispatches them. But if the SSLEngine is closed we were ignoring any already-decrypted data sitting in the unwrapped buffer and instead we were throwing an SSLException. > peer-to-peer message loss due to sending connection closing with TLS enabled > > > Key: GEODE-8681 > URL: https://issues.apache.org/jira/browse/GEODE-8681 > Project: Geode > Issue Type: Bug > Components: membership, messaging >Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0 >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available, release-blocker > > We have observed message loss when TLS is enabled and a distributed lock is > released right after sending a message that doesn't require acknowledgement > if the sending socket is immediately closed. The closing of sockets > immediately after sending a message is frequently seen in function execution > threads or server-side application threads that use this pattern: > {code:java} > try { > DistributedSystem.setThreadsSocketPolicy(false); > acquireDistributedLock(lockName); > (perform one or more cache operations) > } finally { > distLockService.unlock(lockName); > DistributedSystem.releaseThreadsSockets(); // closes the socket > } > {code} > The fault seems to be in NioSSLEngine.unwrap(), which throws an > SSLException() if it finds the SSLEngine is closed even though there is valid > data in its decrypt buffer. It shouldn't throw an exception in that case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GEODE-8685) Exporting data causes a ClassNotFoundException
Anthony Baker created GEODE-8685: Summary: Exporting data causes a ClassNotFoundException Key: GEODE-8685 URL: https://issues.apache.org/jira/browse/GEODE-8685 Project: Geode Issue Type: Task Components: regions Affects Versions: 1.13.0 Reporter: Anthony Baker See [https://lists.apache.org/thread.html/rfa4fc47eb4cb4e75c39d7cb815416bebf2ec233d4db24e37728e922e%40%3Cuser.geode.apache.org%3E.] Report is that exporting data whose values are Classes defined in a deployed jar result in a ClassNotFound exception: {noformat} [error 2020/10/30 08:54:29.317 PDT tid=0x41] org.apache.geode.cache.execute.FunctionException: org.apache.geode.SerializationException: A ClassNotFoundException was thrown while trying to deserialize cached value. java.io.IOException: org.apache.geode.cache.execute.FunctionException: org.apache.geode.SerializationException: A ClassNotFoundException was thrown while trying to deserialize cached value. at org.apache.geode.internal.cache.snapshot.WindowedExporter.export(WindowedExporter.java:106) at org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.exportOnMember(RegionSnapshotServiceImpl.java:361) at org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.save(RegionSnapshotServiceImpl.java:161) at org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.save(RegionSnapshotServiceImpl.java:146) at org.apache.geode.management.internal.cli.functions.ExportDataFunction.executeFunction(ExportDataFunction.java:62) at org.apache.geode.management.cli.CliFunction.execute(CliFunction.java:37) at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:201) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376) at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:442) at org.apache.geode.distributed.internal.ClusterOperationExecutors.doFunctionExecutionThread(ClusterOperationExecutors.java:377) at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.apache.geode.cache.execute.FunctionException: org.apache.geode.SerializationException: A ClassNotFoundException was thrown while trying to deserialize cached value. at org.apache.geode.internal.cache.snapshot.WindowedExporter$WindowedExportCollector.setException(WindowedExporter.java:383) at org.apache.geode.internal.cache.snapshot.WindowedExporter$WindowedExportCollector.addResult(WindowedExporter.java:346) at org.apache.geode.internal.cache.execute.PartitionedRegionFunctionResultSender.lastResult(PartitionedRegionFunctionResultSender.java:195) at org.apache.geode.internal.cache.execute.AbstractExecution.handleException(AbstractExecution.java:502) at org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:353) at org.apache.geode.internal.cache.execute.AbstractExecution.lambda$executeFunctionOnLocalPRNode$0(AbstractExecution.java:273) ... 6 more Caused by: org.apache.geode.SerializationException: A ClassNotFoundException was thrown while trying to deserialize cached value. at org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:2046) at org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:2032) at org.apache.geode.internal.cache.VMCachedDeserializable.getDeserializedValue(VMCachedDeserializable.java:135) at org.apache.geode.internal.cache.EntrySnapshot.getRawValue(EntrySnapshot.java:111) at org.apache.geode.internal.cache.EntrySnapshot.getRawValue(EntrySnapshot.java:99) at org.apache.geode.internal.cache.EntrySnapshot.getValue(EntrySnapshot.java:129) at org.apache.geode.internal.cache.snapshot.SnapshotPacket$SnapshotRecord.(SnapshotPacket.java:79) at org.apache.geode.internal.cache.snapshot.WindowedExporter$WindowedExportFunction.execute(WindowedExporter.java:197) at org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:328) ... 7 more Caused by: java.lang.ClassNotFoundException: org.myApp.domain.myClass at org.apache.geode.internal.ClassPathLoader.forName(ClassPathLoader.jav
[jira] [Commented] (GEODE-8685) Exporting data causes a ClassNotFoundException
[ https://issues.apache.org/jira/browse/GEODE-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226313#comment-17226313 ] Anthony Baker commented on GEODE-8685: -- I think there are two things to investigate here: 1) Why is the class not resolving? 2) Why is the value being deserialized at all? > Exporting data causes a ClassNotFoundException > -- > > Key: GEODE-8685 > URL: https://issues.apache.org/jira/browse/GEODE-8685 > Project: Geode > Issue Type: Task > Components: regions >Affects Versions: 1.13.0 >Reporter: Anthony Baker >Priority: Major > > See > [https://lists.apache.org/thread.html/rfa4fc47eb4cb4e75c39d7cb815416bebf2ec233d4db24e37728e922e%40%3Cuser.geode.apache.org%3E.] > > Report is that exporting data whose values are Classes defined in a deployed > jar result in a ClassNotFound exception: > {noformat} > [error 2020/10/30 08:54:29.317 PDT tid=0x41] > org.apache.geode.cache.execute.FunctionException: > org.apache.geode.SerializationException: A ClassNotFoundException was thrown > while trying to deserialize cached value. > java.io.IOException: org.apache.geode.cache.execute.FunctionException: > org.apache.geode.SerializationException: A ClassNotFoundException was thrown > while trying to deserialize cached value. > at > org.apache.geode.internal.cache.snapshot.WindowedExporter.export(WindowedExporter.java:106) > at > org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.exportOnMember(RegionSnapshotServiceImpl.java:361) > at > org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.save(RegionSnapshotServiceImpl.java:161) > at > org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.save(RegionSnapshotServiceImpl.java:146) > at > org.apache.geode.management.internal.cli.functions.ExportDataFunction.executeFunction(ExportDataFunction.java:62) > at > org.apache.geode.management.cli.CliFunction.execute(CliFunction.java:37) > at > org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:201) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:442) > at > org.apache.geode.distributed.internal.ClusterOperationExecutors.doFunctionExecutionThread(ClusterOperationExecutors.java:377) > at > org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.geode.cache.execute.FunctionException: > org.apache.geode.SerializationException: A ClassNotFoundException was thrown > while trying to deserialize cached value. > at > org.apache.geode.internal.cache.snapshot.WindowedExporter$WindowedExportCollector.setException(WindowedExporter.java:383) > at > org.apache.geode.internal.cache.snapshot.WindowedExporter$WindowedExportCollector.addResult(WindowedExporter.java:346) > at > org.apache.geode.internal.cache.execute.PartitionedRegionFunctionResultSender.lastResult(PartitionedRegionFunctionResultSender.java:195) > at > org.apache.geode.internal.cache.execute.AbstractExecution.handleException(AbstractExecution.java:502) > at > org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:353) > at > org.apache.geode.internal.cache.execute.AbstractExecution.lambda$executeFunctionOnLocalPRNode$0(AbstractExecution.java:273) > ... 6 more > Caused by: org.apache.geode.SerializationException: A ClassNotFoundException > was thrown while trying to deserialize cached value. > at > org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:2046) > at > org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:2032) > at > org.apache.geode.internal.cache.VMCachedDeserializable.getDeserializedValue(VMCachedDeserializable.java:135) > at > org.apache.geode.internal.cache.EntrySnapshot.getRawValue(EntrySnapshot.java:111) > at > org.apache.geode.internal.cache.EntrySnapshot.getRawValue(EntrySnapshot.java:99) > at > org.apache.geode.internal.cache.EntrySnapshot.getValue(EntrySnapshot.java:129) >
[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled
[ https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226315#comment-17226315 ] ASF subversion and git services commented on GEODE-8681: Commit 7da8f9b516ac1e2525a1dfc922af7bfb8995f2c6 in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=7da8f9b ] GEODE-8681: peer-to-peer message loss due to sending connection closing with TLS enabled (#5699) A socket-read could pick up more than one message and a single unwrap() could decrypt multiple messages. Normally the engine isn't closed and it reports normal status from an unwrap() operation, and Connection.processInputBuffer picks up each message, one by one, from the buffer and dispatches them. But if the SSLEngine is closed we were ignoring any already-decrypted data sitting in the unwrapped buffer and instead we were throwing an SSLException. > peer-to-peer message loss due to sending connection closing with TLS enabled > > > Key: GEODE-8681 > URL: https://issues.apache.org/jira/browse/GEODE-8681 > Project: Geode > Issue Type: Bug > Components: membership, messaging >Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0 >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available, release-blocker > > We have observed message loss when TLS is enabled and a distributed lock is > released right after sending a message that doesn't require acknowledgement > if the sending socket is immediately closed. The closing of sockets > immediately after sending a message is frequently seen in function execution > threads or server-side application threads that use this pattern: > {code:java} > try { > DistributedSystem.setThreadsSocketPolicy(false); > acquireDistributedLock(lockName); > (perform one or more cache operations) > } finally { > distLockService.unlock(lockName); > DistributedSystem.releaseThreadsSockets(); // closes the socket > } > {code} > The fault seems to be in NioSSLEngine.unwrap(), which throws an > SSLException() if it finds the SSLEngine is closed even though there is valid > data in its decrypt buffer. It shouldn't throw an exception in that case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled
[ https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226317#comment-17226317 ] ASF subversion and git services commented on GEODE-8681: Commit 7da8f9b516ac1e2525a1dfc922af7bfb8995f2c6 in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=7da8f9b ] GEODE-8681: peer-to-peer message loss due to sending connection closing with TLS enabled (#5699) A socket-read could pick up more than one message and a single unwrap() could decrypt multiple messages. Normally the engine isn't closed and it reports normal status from an unwrap() operation, and Connection.processInputBuffer picks up each message, one by one, from the buffer and dispatches them. But if the SSLEngine is closed we were ignoring any already-decrypted data sitting in the unwrapped buffer and instead we were throwing an SSLException. > peer-to-peer message loss due to sending connection closing with TLS enabled > > > Key: GEODE-8681 > URL: https://issues.apache.org/jira/browse/GEODE-8681 > Project: Geode > Issue Type: Bug > Components: membership, messaging >Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0 >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available, release-blocker > > We have observed message loss when TLS is enabled and a distributed lock is > released right after sending a message that doesn't require acknowledgement > if the sending socket is immediately closed. The closing of sockets > immediately after sending a message is frequently seen in function execution > threads or server-side application threads that use this pattern: > {code:java} > try { > DistributedSystem.setThreadsSocketPolicy(false); > acquireDistributedLock(lockName); > (perform one or more cache operations) > } finally { > distLockService.unlock(lockName); > DistributedSystem.releaseThreadsSockets(); // closes the socket > } > {code} > The fault seems to be in NioSSLEngine.unwrap(), which throws an > SSLException() if it finds the SSLEngine is closed even though there is valid > data in its decrypt buffer. It shouldn't throw an exception in that case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled
[ https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226320#comment-17226320 ] ASF subversion and git services commented on GEODE-8681: Commit 7da8f9b516ac1e2525a1dfc922af7bfb8995f2c6 in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=7da8f9b ] GEODE-8681: peer-to-peer message loss due to sending connection closing with TLS enabled (#5699) A socket-read could pick up more than one message and a single unwrap() could decrypt multiple messages. Normally the engine isn't closed and it reports normal status from an unwrap() operation, and Connection.processInputBuffer picks up each message, one by one, from the buffer and dispatches them. But if the SSLEngine is closed we were ignoring any already-decrypted data sitting in the unwrapped buffer and instead we were throwing an SSLException. > peer-to-peer message loss due to sending connection closing with TLS enabled > > > Key: GEODE-8681 > URL: https://issues.apache.org/jira/browse/GEODE-8681 > Project: Geode > Issue Type: Bug > Components: membership, messaging >Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0 >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available, release-blocker > > We have observed message loss when TLS is enabled and a distributed lock is > released right after sending a message that doesn't require acknowledgement > if the sending socket is immediately closed. The closing of sockets > immediately after sending a message is frequently seen in function execution > threads or server-side application threads that use this pattern: > {code:java} > try { > DistributedSystem.setThreadsSocketPolicy(false); > acquireDistributedLock(lockName); > (perform one or more cache operations) > } finally { > distLockService.unlock(lockName); > DistributedSystem.releaseThreadsSockets(); // closes the socket > } > {code} > The fault seems to be in NioSSLEngine.unwrap(), which throws an > SSLException() if it finds the SSLEngine is closed even though there is valid > data in its decrypt buffer. It shouldn't throw an exception in that case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8676) Update bookbindery to latest
[ https://issues.apache.org/jira/browse/GEODE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226326#comment-17226326 ] ASF subversion and git services commented on GEODE-8676: Commit 9279098352e5c6440cade1196b9b99dcf89e90c5 in geode-native's branch refs/heads/develop from M. Oleske [ https://gitbox.apache.org/repos/asf?p=geode-native.git;h=9279098 ] GEODE-8676: Update Bookbindery (#683) * Bump bookbindery from 10.1.14 to 10.1.15 in /docs/geode-native-book-cpp Authored-by: M. Oleske > Update bookbindery to latest > > > Key: GEODE-8676 > URL: https://issues.apache.org/jira/browse/GEODE-8676 > Project: Geode > Issue Type: Improvement > Components: docs, native client >Reporter: Michael Oleske >Priority: Major > Labels: pull-request-available > > [Bookbinder|https://github.com/pivotal-cf/bookbinder/releases] has a new > release and we should keep the tools we use to build our docs up to date -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8676) Update bookbindery to latest
[ https://issues.apache.org/jira/browse/GEODE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226327#comment-17226327 ] ASF GitHub Bot commented on GEODE-8676: --- davebarnes97 merged pull request #683: URL: https://github.com/apache/geode-native/pull/683 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update bookbindery to latest > > > Key: GEODE-8676 > URL: https://issues.apache.org/jira/browse/GEODE-8676 > Project: Geode > Issue Type: Improvement > Components: docs, native client >Reporter: Michael Oleske >Priority: Major > Labels: pull-request-available > > [Bookbinder|https://github.com/pivotal-cf/bookbinder/releases] has a new > release and we should keep the tools we use to build our docs up to date -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8685) Exporting data causes a ClassNotFoundException
[ https://issues.apache.org/jira/browse/GEODE-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinmei Liao updated GEODE-8685: --- Labels: GeodeOperationAPI (was: ) > Exporting data causes a ClassNotFoundException > -- > > Key: GEODE-8685 > URL: https://issues.apache.org/jira/browse/GEODE-8685 > Project: Geode > Issue Type: Task > Components: regions >Affects Versions: 1.13.0 >Reporter: Anthony Baker >Priority: Major > Labels: GeodeOperationAPI > > See > [https://lists.apache.org/thread.html/rfa4fc47eb4cb4e75c39d7cb815416bebf2ec233d4db24e37728e922e%40%3Cuser.geode.apache.org%3E.] > > Report is that exporting data whose values are Classes defined in a deployed > jar result in a ClassNotFound exception: > {noformat} > [error 2020/10/30 08:54:29.317 PDT tid=0x41] > org.apache.geode.cache.execute.FunctionException: > org.apache.geode.SerializationException: A ClassNotFoundException was thrown > while trying to deserialize cached value. > java.io.IOException: org.apache.geode.cache.execute.FunctionException: > org.apache.geode.SerializationException: A ClassNotFoundException was thrown > while trying to deserialize cached value. > at > org.apache.geode.internal.cache.snapshot.WindowedExporter.export(WindowedExporter.java:106) > at > org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.exportOnMember(RegionSnapshotServiceImpl.java:361) > at > org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.save(RegionSnapshotServiceImpl.java:161) > at > org.apache.geode.internal.cache.snapshot.RegionSnapshotServiceImpl.save(RegionSnapshotServiceImpl.java:146) > at > org.apache.geode.management.internal.cli.functions.ExportDataFunction.executeFunction(ExportDataFunction.java:62) > at > org.apache.geode.management.cli.CliFunction.execute(CliFunction.java:37) > at > org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:201) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:442) > at > org.apache.geode.distributed.internal.ClusterOperationExecutors.doFunctionExecutionThread(ClusterOperationExecutors.java:377) > at > org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.geode.cache.execute.FunctionException: > org.apache.geode.SerializationException: A ClassNotFoundException was thrown > while trying to deserialize cached value. > at > org.apache.geode.internal.cache.snapshot.WindowedExporter$WindowedExportCollector.setException(WindowedExporter.java:383) > at > org.apache.geode.internal.cache.snapshot.WindowedExporter$WindowedExportCollector.addResult(WindowedExporter.java:346) > at > org.apache.geode.internal.cache.execute.PartitionedRegionFunctionResultSender.lastResult(PartitionedRegionFunctionResultSender.java:195) > at > org.apache.geode.internal.cache.execute.AbstractExecution.handleException(AbstractExecution.java:502) > at > org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:353) > at > org.apache.geode.internal.cache.execute.AbstractExecution.lambda$executeFunctionOnLocalPRNode$0(AbstractExecution.java:273) > ... 6 more > Caused by: org.apache.geode.SerializationException: A ClassNotFoundException > was thrown while trying to deserialize cached value. > at > org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:2046) > at > org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:2032) > at > org.apache.geode.internal.cache.VMCachedDeserializable.getDeserializedValue(VMCachedDeserializable.java:135) > at > org.apache.geode.internal.cache.EntrySnapshot.getRawValue(EntrySnapshot.java:111) > at > org.apache.geode.internal.cache.EntrySnapshot.getRawValue(EntrySnapshot.java:99) > at > org.apache.geode.internal.cache.EntrySnapshot.getValue(EntrySnapshot.java:129) > at > org.apache.geode.internal.cache.snapshot.SnapshotPacket$SnapshotRecord.(SnapshotPacket.java:79) >
[jira] [Commented] (GEODE-8672) Concurrent transactional destroy with GII could cause an entry to be removed and version information to be lost
[ https://issues.apache.org/jira/browse/GEODE-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226345#comment-17226345 ] ASF GitHub Bot commented on GEODE-8672: --- pivotal-eshu opened a new pull request #5702: URL: https://github.com/apache/geode/pull/5702 … (#5691)" This reverts commit e695938dff4b39f1755c707e81e1eb7e2e143fe0. Thank you for submitting a contribution to Apache Geode. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Has your PR been rebased against the latest commit within the target branch (typically `develop`)? - [ ] Is your initial contribution a single, squashed commit? - [ ] Does `gradlew build` run cleanly? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? ### Note: Please ensure that once the PR is submitted, check Concourse for build issues and submit an update to your PR as soon as possible. If you need help, please send an email to d...@geode.apache.org. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Concurrent transactional destroy with GII could cause an entry to be removed > and version information to be lost > --- > > Key: GEODE-8672 > URL: https://issues.apache.org/jira/browse/GEODE-8672 > Project: Geode > Issue Type: Bug > Components: regions >Affects Versions: 1.1.0 >Reporter: Eric Shu >Assignee: Eric Shu >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > In a newly rebalanced bucket, while GII is in progress, a transactional > destroy is applied to cache. There is a logic that it should be in token mode > and leaves the entry as a Destroyed token, even though the version tag of the > entry indicates that it has the correct version. > However, at end of the GII, there is a > cleanUpDestroyedTokensAndMarkGIIComplete method removes all the destroyed > entries – this wipes off the entry version tag information and cause the > subsequent creates starts fresh with new version tags. > This could leads to client server data inconsistency as the newly created > entries will be ignored by the clients as the newly created entry has lower > version number while client has high ones. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226348#comment-17226348 ] ASF GitHub Bot commented on GEODE-8647: --- lgtm-com[bot] commented on pull request #682: URL: https://github.com/apache/geode-native/pull/682#issuecomment-721919329 This pull request **introduces 4 alerts** when merging 0588ef5947fe3875c7f2cb90732179ecbada8bfb into 9279098352e5c6440cade1196b9b99dcf89e90c5 - [view on LGTM.com](https://lgtm.com/projects/g/apache/geode-native/rev/pr-ed3924809465bbcd002b9a6d79ae4ffd394a4423) **new alerts:** * 2 for Call to GC\.Collect\(\) * 2 for Useless assignment to local variable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8676) Update bookbindery to latest
[ https://issues.apache.org/jira/browse/GEODE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226361#comment-17226361 ] ASF GitHub Bot commented on GEODE-8676: --- davebarnes97 opened a new pull request #685: URL: https://github.com/apache/geode-native/pull/685 One change that I think should have been included in the previous PR for this ticket. Adds a `bundle exec` prefix before `rackup` in the view-docs.sh script. Should improve the user experience. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update bookbindery to latest > > > Key: GEODE-8676 > URL: https://issues.apache.org/jira/browse/GEODE-8676 > Project: Geode > Issue Type: Improvement > Components: docs, native client >Reporter: Michael Oleske >Priority: Major > Labels: pull-request-available > > [Bookbinder|https://github.com/pivotal-cf/bookbinder/releases] has a new > release and we should keep the tools we use to build our docs up to date -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226370#comment-17226370 ] ASF GitHub Bot commented on GEODE-8647: --- codecov-io edited a comment on pull request #682: URL: https://github.com/apache/geode-native/pull/682#issuecomment-719112394 # [Codecov](https://codecov.io/gh/apache/geode-native/pull/682?src=pr&el=h1) Report > Merging [#682](https://codecov.io/gh/apache/geode-native/pull/682?src=pr&el=desc) into [develop](https://codecov.io/gh/apache/geode-native/commit/0d9a99d5e0632de62df17921950cf3f6640efb33?el=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [](https://codecov.io/gh/apache/geode-native/pull/682?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## develop #682 +/- ## === - Coverage74.04% 74.02% -0.02% === Files 644 644 Lines5118951189 === - Hits 3790337894 -9 - Misses 1328613295 +9 ``` | [Impacted Files](https://codecov.io/gh/apache/geode-native/pull/682?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [...test/testThinClientPoolExecuteHAFunctionPrSHOP.cpp](https://codecov.io/gh/apache/geode-native/pull/682/diff?src=pr&el=tree#diff-Y3BwY2FjaGUvaW50ZWdyYXRpb24tdGVzdC90ZXN0VGhpbkNsaWVudFBvb2xFeGVjdXRlSEFGdW5jdGlvblByU0hPUC5jcHA=) | `91.20% <0.00%> (-3.71%)` | :arrow_down: | | [cppcache/src/ThinClientRedundancyManager.cpp](https://codecov.io/gh/apache/geode-native/pull/682/diff?src=pr&el=tree#diff-Y3BwY2FjaGUvc3JjL1RoaW5DbGllbnRSZWR1bmRhbmN5TWFuYWdlci5jcHA=) | `75.78% <0.00%> (-0.63%)` | :arrow_down: | | [cppcache/src/ClientMetadataService.cpp](https://codecov.io/gh/apache/geode-native/pull/682/diff?src=pr&el=tree#diff-Y3BwY2FjaGUvc3JjL0NsaWVudE1ldGFkYXRhU2VydmljZS5jcHA=) | `62.24% <0.00%> (-0.46%)` | :arrow_down: | | [cppcache/src/ExecutionImpl.cpp](https://codecov.io/gh/apache/geode-native/pull/682/diff?src=pr&el=tree#diff-Y3BwY2FjaGUvc3JjL0V4ZWN1dGlvbkltcGwuY3Bw) | `68.07% <0.00%> (-0.39%)` | :arrow_down: | | [cppcache/src/ThinClientPoolDM.cpp](https://codecov.io/gh/apache/geode-native/pull/682/diff?src=pr&el=tree#diff-Y3BwY2FjaGUvc3JjL1RoaW5DbGllbnRQb29sRE0uY3Bw) | `76.23% <0.00%> (-0.15%)` | :arrow_down: | | [cppcache/src/ThinClientRegion.cpp](https://codecov.io/gh/apache/geode-native/pull/682/diff?src=pr&el=tree#diff-Y3BwY2FjaGUvc3JjL1RoaW5DbGllbnRSZWdpb24uY3Bw) | `56.04% <0.00%> (-0.06%)` | :arrow_down: | | [cppcache/src/TcrEndpoint.cpp](https://codecov.io/gh/apache/geode-native/pull/682/diff?src=pr&el=tree#diff-Y3BwY2FjaGUvc3JjL1RjckVuZHBvaW50LmNwcA==) | `55.11% <0.00%> (+0.56%)` | :arrow_up: | | [cppcache/src/TcrConnection.cpp](https://codecov.io/gh/apache/geode-native/pull/682/diff?src=pr&el=tree#diff-Y3BwY2FjaGUvc3JjL1RjckNvbm5lY3Rpb24uY3Bw) | `73.27% <0.00%> (+0.78%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/geode-native/pull/682?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/geode-native/pull/682?src=pr&el=footer). Last update [0d9a99d...69f5a49](https://codecov.io/gh/apache/geode-native/pull/682?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226369#comment-17226369 ] ASF GitHub Bot commented on GEODE-8647: --- pivotal-jbarrett commented on a change in pull request #682: URL: https://github.com/apache/geode-native/pull/682#discussion_r517595960 ## File path: clicache/src/DataInput.cpp ## @@ -93,8 +93,9 @@ namespace Apache if (buffer != nullptr && buffer->Length > 0) { _GF_MG_EXCEPTION_TRY2 -System::Int32 len = buffer->Length; - _GEODE_NEW(m_buffer, System::Byte[len]); + System::Int32 len = buffer->Length; Review comment: auto? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226371#comment-17226371 ] ASF GitHub Bot commented on GEODE-8647: --- lgtm-com[bot] commented on pull request #682: URL: https://github.com/apache/geode-native/pull/682#issuecomment-721943662 This pull request **introduces 4 alerts** when merging 69f5a49c1d86d5cb52cb6fe6ccbca5e27c87 into 9279098352e5c6440cade1196b9b99dcf89e90c5 - [view on LGTM.com](https://lgtm.com/projects/g/apache/geode-native/rev/pr-9196608f3c466b1421ff234d7e093fcfb418615c) **new alerts:** * 2 for Call to GC\.Collect\(\) * 2 for Useless assignment to local variable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226376#comment-17226376 ] ASF GitHub Bot commented on GEODE-8647: --- pdxcodemonkey merged pull request #682: URL: https://github.com/apache/geode-native/pull/682 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226375#comment-17226375 ] ASF subversion and git services commented on GEODE-8647: Commit 3ae26364750ed799b9c24e065d08d145129166b5 in geode-native's branch refs/heads/develop from Blake Bender [ https://gitbox.apache.org/repos/asf?p=geode-native.git;h=3ae2636 ] GEODE-8647: Stop leaking buffer in CLI DataInput (#682) * Stop leaking buffer in CLI DataInput when we have to copy incoming buffer * Add CLI integration test to verify leak is fixed. * Remove no-longer-used Cleanup method from DataInput * Specify LGTM warnings to disable in test code Co-authored-by: Jacob Barrett > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (GEODE-8674) CLI DataInput object leaks internal buffer when allocating ctor is called
[ https://issues.apache.org/jira/browse/GEODE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Bender closed GEODE-8674. --- > CLI DataInput object leaks internal buffer when allocating ctor is called > - > > Key: GEODE-8674 > URL: https://issues.apache.org/jira/browse/GEODE-8674 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Blake Bender >Assignee: Blake Bender >Priority: Major > Fix For: 1.14.0 > > > The CLI DataInput object has two ctors, one of which copies the passed-in > buffer parameter via new[] and one of which doesn't. In the event that the > former is called, the buffer is leaked when the object is deleted/Disposed. > Here's the current code for CLI `DataInput::~DataInput`: > ``` > ~DataInput( ) \{ Cleanup(); } > ``` > And the code for `DataInput::Cleanup`: > ``` > void DataInput::Cleanup() > { > //TODO: > //GF_SAFE_DELETE_ARRAY(m_buffer); > } > ``` > So apparently this bug has been known for some time (?!?), but has never been > fixed. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEODE-8674) CLI DataInput object leaks internal buffer when allocating ctor is called
[ https://issues.apache.org/jira/browse/GEODE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Bender resolved GEODE-8674. - Fix Version/s: 1.14.0 Resolution: Fixed > CLI DataInput object leaks internal buffer when allocating ctor is called > - > > Key: GEODE-8674 > URL: https://issues.apache.org/jira/browse/GEODE-8674 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Blake Bender >Assignee: Blake Bender >Priority: Major > Fix For: 1.14.0 > > > The CLI DataInput object has two ctors, one of which copies the passed-in > buffer parameter via new[] and one of which doesn't. In the event that the > former is called, the buffer is leaked when the object is deleted/Disposed. > Here's the current code for CLI `DataInput::~DataInput`: > ``` > ~DataInput( ) \{ Cleanup(); } > ``` > And the code for `DataInput::Cleanup`: > ``` > void DataInput::Cleanup() > { > //TODO: > //GF_SAFE_DELETE_ARRAY(m_buffer); > } > ``` > So apparently this bug has been known for some time (?!?), but has never been > fixed. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8647) Support using multiple DistributedMap Rules in one test
[ https://issues.apache.org/jira/browse/GEODE-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226380#comment-17226380 ] ASF GitHub Bot commented on GEODE-8647: --- lgtm-com[bot] commented on pull request #682: URL: https://github.com/apache/geode-native/pull/682#issuecomment-721966498 This pull request **introduces 4 alerts** when merging 0e9463d69e7be5455a351eae23b87cef9b2382ac into 9279098352e5c6440cade1196b9b99dcf89e90c5 - [view on LGTM.com](https://lgtm.com/projects/g/apache/geode-native/rev/pr-a03165f373b94fecca20198e049a9105fc55bcb8) **new alerts:** * 2 for Call to GC\.Collect\(\) * 2 for Useless assignment to local variable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support using multiple DistributedMap Rules in one test > --- > > Key: GEODE-8647 > URL: https://issues.apache.org/jira/browse/GEODE-8647 > Project: Geode > Issue Type: Wish > Components: tests >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: pull-request-available > > Support using multiple DistributedMap Rules in one test. Right now the Rule > only supports having one instance in a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8191) MemberMXBeanDistributedTest.testBucketCount fails intermittently
[ https://issues.apache.org/jira/browse/GEODE-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226383#comment-17226383 ] Sarah Abbey commented on GEODE-8191: Failed again here: https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/601 > MemberMXBeanDistributedTest.testBucketCount fails intermittently > > > Key: GEODE-8191 > URL: https://issues.apache.org/jira/browse/GEODE-8191 > Project: Geode > Issue Type: Bug > Components: jmx, tests >Reporter: Kirk Lund >Assignee: Mario Ivanac >Priority: Major > Labels: flaky, pull-request-available > Fix For: 1.14.0 > > > This appears to be a flaky test related to GEODE-7963 which was resolved by > Mario Ivanac so I've assigned the ticket to him. > {noformat} > org.apache.geode.management.MemberMXBeanDistributedTest > testBucketCount > FAILED > org.awaitility.core.ConditionTimeoutException: Assertion condition > defined as a lambda expression in > org.apache.geode.management.MemberMXBeanDistributedTest Expected bucket count > is 4000, and actual count is 3750 expected:<3750> but was:<4000> within 5 > minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119) > at > org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679) > at > org.apache.geode.management.MemberMXBeanDistributedTest.testBucketCount(MemberMXBeanDistributedTest.java:102) > Caused by: > java.lang.AssertionError: Expected bucket count is 4000, and actual > count is 3750 expected:<3750> but was:<4000> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.geode.management.MemberMXBeanDistributedTest.lambda$testBucketCount$1(MemberMXBeanDistributedTest.java:107) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8191) MemberMXBeanDistributedTest.testBucketCount fails intermittently
[ https://issues.apache.org/jira/browse/GEODE-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226384#comment-17226384 ] Geode Integration commented on GEODE-8191: -- Seen in [DistributedTestOpenJDK8 #601|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/601] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0465/test-results/distributedTest/1604520164/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0465/test-artifacts/1604520164/distributedtestfiles-OpenJDK8-1.14.0-build.0465.tgz]. > MemberMXBeanDistributedTest.testBucketCount fails intermittently > > > Key: GEODE-8191 > URL: https://issues.apache.org/jira/browse/GEODE-8191 > Project: Geode > Issue Type: Bug > Components: jmx, tests >Reporter: Kirk Lund >Assignee: Mario Ivanac >Priority: Major > Labels: flaky, pull-request-available > Fix For: 1.14.0 > > > This appears to be a flaky test related to GEODE-7963 which was resolved by > Mario Ivanac so I've assigned the ticket to him. > {noformat} > org.apache.geode.management.MemberMXBeanDistributedTest > testBucketCount > FAILED > org.awaitility.core.ConditionTimeoutException: Assertion condition > defined as a lambda expression in > org.apache.geode.management.MemberMXBeanDistributedTest Expected bucket count > is 4000, and actual count is 3750 expected:<3750> but was:<4000> within 5 > minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119) > at > org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679) > at > org.apache.geode.management.MemberMXBeanDistributedTest.testBucketCount(MemberMXBeanDistributedTest.java:102) > Caused by: > java.lang.AssertionError: Expected bucket count is 4000, and actual > count is 3750 expected:<3750> but was:<4000> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.geode.management.MemberMXBeanDistributedTest.lambda$testBucketCount$1(MemberMXBeanDistributedTest.java:107) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8672) Concurrent transactional destroy with GII could cause an entry to be removed and version information to be lost
[ https://issues.apache.org/jira/browse/GEODE-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226390#comment-17226390 ] ASF GitHub Bot commented on GEODE-8672: --- pivotal-eshu merged pull request #5702: URL: https://github.com/apache/geode/pull/5702 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Concurrent transactional destroy with GII could cause an entry to be removed > and version information to be lost > --- > > Key: GEODE-8672 > URL: https://issues.apache.org/jira/browse/GEODE-8672 > Project: Geode > Issue Type: Bug > Components: regions >Affects Versions: 1.1.0 >Reporter: Eric Shu >Assignee: Eric Shu >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > In a newly rebalanced bucket, while GII is in progress, a transactional > destroy is applied to cache. There is a logic that it should be in token mode > and leaves the entry as a Destroyed token, even though the version tag of the > entry indicates that it has the correct version. > However, at end of the GII, there is a > cleanUpDestroyedTokensAndMarkGIIComplete method removes all the destroyed > entries – this wipes off the entry version tag information and cause the > subsequent creates starts fresh with new version tags. > This could leads to client server data inconsistency as the newly created > entries will be ignored by the clients as the newly created entry has lower > version number while client has high ones. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8672) Concurrent transactional destroy with GII could cause an entry to be removed and version information to be lost
[ https://issues.apache.org/jira/browse/GEODE-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226392#comment-17226392 ] ASF subversion and git services commented on GEODE-8672: Commit 7367d17e3817fc41666d471c5eb4d0df0d33c18b in geode's branch refs/heads/develop from Eric Shu [ https://gitbox.apache.org/repos/asf?p=geode.git;h=7367d17 ] Revert "GEODE-8672: No need in token mode if concurrencyChecksEnabled (#5691)" (#5702) This reverts commit e695938dff4b39f1755c707e81e1eb7e2e143fe0. > Concurrent transactional destroy with GII could cause an entry to be removed > and version information to be lost > --- > > Key: GEODE-8672 > URL: https://issues.apache.org/jira/browse/GEODE-8672 > Project: Geode > Issue Type: Bug > Components: regions >Affects Versions: 1.1.0 >Reporter: Eric Shu >Assignee: Eric Shu >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > In a newly rebalanced bucket, while GII is in progress, a transactional > destroy is applied to cache. There is a logic that it should be in token mode > and leaves the entry as a Destroyed token, even though the version tag of the > entry indicates that it has the correct version. > However, at end of the GII, there is a > cleanUpDestroyedTokensAndMarkGIIComplete method removes all the destroyed > entries – this wipes off the entry version tag information and cause the > subsequent creates starts fresh with new version tags. > This could leads to client server data inconsistency as the newly created > entries will be ignored by the clients as the newly created entry has lower > version number while client has high ones. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8672) Concurrent transactional destroy with GII could cause an entry to be removed and version information to be lost
[ https://issues.apache.org/jira/browse/GEODE-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226391#comment-17226391 ] ASF subversion and git services commented on GEODE-8672: Commit 7367d17e3817fc41666d471c5eb4d0df0d33c18b in geode's branch refs/heads/develop from Eric Shu [ https://gitbox.apache.org/repos/asf?p=geode.git;h=7367d17 ] Revert "GEODE-8672: No need in token mode if concurrencyChecksEnabled (#5691)" (#5702) This reverts commit e695938dff4b39f1755c707e81e1eb7e2e143fe0. > Concurrent transactional destroy with GII could cause an entry to be removed > and version information to be lost > --- > > Key: GEODE-8672 > URL: https://issues.apache.org/jira/browse/GEODE-8672 > Project: Geode > Issue Type: Bug > Components: regions >Affects Versions: 1.1.0 >Reporter: Eric Shu >Assignee: Eric Shu >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > In a newly rebalanced bucket, while GII is in progress, a transactional > destroy is applied to cache. There is a logic that it should be in token mode > and leaves the entry as a Destroyed token, even though the version tag of the > entry indicates that it has the correct version. > However, at end of the GII, there is a > cleanUpDestroyedTokensAndMarkGIIComplete method removes all the destroyed > entries – this wipes off the entry version tag information and cause the > subsequent creates starts fresh with new version tags. > This could leads to client server data inconsistency as the newly created > entries will be ignored by the clients as the newly created entry has lower > version number while client has high ones. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8676) Update bookbindery to latest
[ https://issues.apache.org/jira/browse/GEODE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226404#comment-17226404 ] ASF subversion and git services commented on GEODE-8676: Commit 07910325960691ab6774bedbf6e1a96f693e85d1 in geode-native's branch refs/heads/develop from Dave Barnes [ https://gitbox.apache.org/repos/asf?p=geode-native.git;h=0791032 ] GEODE-8676: Update Bookbindery (#685) > Update bookbindery to latest > > > Key: GEODE-8676 > URL: https://issues.apache.org/jira/browse/GEODE-8676 > Project: Geode > Issue Type: Improvement > Components: docs, native client >Reporter: Michael Oleske >Priority: Major > Labels: pull-request-available > > [Bookbinder|https://github.com/pivotal-cf/bookbinder/releases] has a new > release and we should keep the tools we use to build our docs up to date -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8676) Update bookbindery to latest
[ https://issues.apache.org/jira/browse/GEODE-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226405#comment-17226405 ] ASF GitHub Bot commented on GEODE-8676: --- moleske merged pull request #685: URL: https://github.com/apache/geode-native/pull/685 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update bookbindery to latest > > > Key: GEODE-8676 > URL: https://issues.apache.org/jira/browse/GEODE-8676 > Project: Geode > Issue Type: Improvement > Components: docs, native client >Reporter: Michael Oleske >Priority: Major > Labels: pull-request-available > > [Bookbinder|https://github.com/pivotal-cf/bookbinder/releases] has a new > release and we should keep the tools we use to build our docs up to date -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8626) Omitting field-mapping tag of cache.xml when using Simple JDBC Connector
[ https://issues.apache.org/jira/browse/GEODE-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226421#comment-17226421 ] ASF GitHub Bot commented on GEODE-8626: --- jchen21 commented on pull request #5637: URL: https://github.com/apache/geode/pull/5637#issuecomment-722021982 Reviewing the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Omitting field-mapping tag of cache.xml when using Simple JDBC Connector > > > Key: GEODE-8626 > URL: https://issues.apache.org/jira/browse/GEODE-8626 > Project: Geode > Issue Type: Improvement > Components: jdbc >Reporter: Masaki Yamakawa >Priority: Minor > Labels: pull-request-available > > When configuring Simple JDBC Connector with gfsh, I don't need to create > field-mapping, the default field-mapping will be created from pdx and table > meta data. > On the other hand, when using cache.xml(cluster.xml), pdx and table meta data > cannot be used, and field-mapping must be described in cache.xml. > I would like to create field-mapping defaults based on pdx and table meta > data when using cache.xml. > If field-mapping is specified in cache.xml, the xml setting has priority, and > only if there are no field-mapping tags. > cache.xml will be as follows: > {code:java} > > data-source="TestDataSource" > table="employees" > pdx-name="org.apache.geode.connectors.jdbc.Employee" > ids="id"> > > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8686) Tombstone removal optimization during GII could cause deadlock
[ https://issues.apache.org/jira/browse/GEODE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Donal Evans updated GEODE-8686: --- Description: Similar to the issue described in GEODE-6526, if the condition in the below if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could lead to deadlock between the calling thread and a Tombstone GC thread calling {{TombstoneService.gcTombstones()}}. {code:java} if (owner.getServerProxy() == null && owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), entryVersion.getRegionVersion())) { // the received tombstone has already been reaped, so don't retain it if (owner.getIndexManager() != null) { owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); } removeTombstone(oldRe, entryVersion, false, false); return false; } else { owner.scheduleTombstone(oldRe, entryVersion); lruEntryDestroy(oldRe); } {code} The proposed change is to remove this if statement and allow the old tombstone to be collected later by calling {{scheduleTombstone()}} in all cases. The call to {{AbstractRegionMap.removeTombstone()}} in {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to allow immediate removal of tombstones that we know have already been collected on other members, but since the conditions to trigger it are rare (the old entry must be a tombstone, the new entry received during GII must be a tombstone with a newer version, and we must have already collected a tombstone with a newer version than the new entry) and the overhead of scheduling a tombstone to be collected is comparatively low, the performance impact of removing this optimization in favour of simply scheduling the tombstone to be collected in all cases should be insignificant. The solution to the deadlock observed in GEODE-6526 was also to remove the call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be collected later and did not result in any unwanted behaviour, so the proposed fix should be similarly low-impact. Also of note is that with this proposed change, there will be no calls to {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} class, which should ensure that other deadlocks involving this method are not possible. was: Similar to the issue described in GEODE-6526, if the condition in the below if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could lead to deadlock between the calling thread and a Tombstone GC thread calling {{TombstoneService.gcTombstones()}}. {code:java} if (owner.getServerProxy() == null && owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), entryVersion.getRegionVersion())) { // the received tombstone has already been reaped, so don't retain it if (owner.getIndexManager() != null) { owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); } removeTombstone(oldRe, entryVersion, false, false); return false; } else { owner.scheduleTombstone(oldRe, entryVersion); lruEntryDestroy(oldRe); } {code} The proposed change is to remove this if statement and allow the old tombstone to be collected later by calling {{scheduleTombstone()}} in all cases{{.}} The call to {{AbstractRegionMap.removeTombstone()}} in {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to allow immediate removal of tombstones that we know have already been collected on other members, but since the conditions to trigger it are rare (the old entry must be a tombstone, the new entry received during GII must be a tombstone with a newer version, and we must have already collected a tombstone with a newer version than the new entry) and the overhead of scheduling a tombstone to be collected is comparatively low, the performance impact of removing this optimization in favour of simply scheduling the tombstone to be collected in all cases should be insignificant. The solution to the deadlock observed in GEODE-6526 was also to remove the call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be collected later and did not result in any unwanted behaviour, so the proposed fix should be similarly low-impact. Also of note is that with this proposed change, there will be no calls to {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} class, which should ensure that other deadlocks involving this method are not possible. > Tombstone removal optimization during GII could cause deadlock > -- > >
[jira] [Created] (GEODE-8686) Tombstone removal optimization during GII could cause deadlock
Donal Evans created GEODE-8686: -- Summary: Tombstone removal optimization during GII could cause deadlock Key: GEODE-8686 URL: https://issues.apache.org/jira/browse/GEODE-8686 Project: Geode Issue Type: Improvement Affects Versions: 1.13.0, 1.12.0, 1.11.0, 1.10.0, 1.14.0 Reporter: Donal Evans Similar to the issue described in GEODE-6526, if the condition in the below if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could lead to deadlock between the calling thread and a Tombstone GC thread calling {{TombstoneService.gcTombstones()}}. {code:java} if (owner.getServerProxy() == null && owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), entryVersion.getRegionVersion())) { // the received tombstone has already been reaped, so don't retain it if (owner.getIndexManager() != null) { owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); } removeTombstone(oldRe, entryVersion, false, false); return false; } else { owner.scheduleTombstone(oldRe, entryVersion); lruEntryDestroy(oldRe); } {code} The proposed change is to remove this if statement and allow the old tombstone to be collected later by calling {{scheduleTombstone()}} in all cases{{.}} The call to {{AbstractRegionMap.removeTombstone()}} in {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to allow immediate removal of tombstones that we know have already been collected on other members, but since the conditions to trigger it are rare (the old entry must be a tombstone, the new entry received during GII must be a tombstone with a newer version, and we must have already collected a tombstone with a newer version than the new entry) and the overhead of scheduling a tombstone to be collected is comparatively low, the performance impact of removing this optimization in favour of simply scheduling the tombstone to be collected in all cases should be insignificant. The solution to the deadlock observed in GEODE-6526 was also to remove the call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be collected later and did not result in any unwanted behaviour, so the proposed fix should be similarly low-impact. Also of note is that with this proposed change, there will be no calls to {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} class, which should ensure that other deadlocks involving this method are not possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (GEODE-8686) Tombstone removal optimization during GII could cause deadlock
[ https://issues.apache.org/jira/browse/GEODE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Donal Evans reassigned GEODE-8686: -- Assignee: Donal Evans > Tombstone removal optimization during GII could cause deadlock > -- > > Key: GEODE-8686 > URL: https://issues.apache.org/jira/browse/GEODE-8686 > Project: Geode > Issue Type: Improvement >Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0 >Reporter: Donal Evans >Assignee: Donal Evans >Priority: Major > > Similar to the issue described in GEODE-6526, if the condition in the below > if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a > call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could > lead to deadlock between the calling thread and a Tombstone GC thread calling > {{TombstoneService.gcTombstones()}}. > {code:java} > if (owner.getServerProxy() == null && > owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), > entryVersion.getRegionVersion())) { > // the received tombstone has already been reaped, so don't retain it > if (owner.getIndexManager() != null) { > owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, > IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); > } > removeTombstone(oldRe, entryVersion, false, false); > return false; > } else { > owner.scheduleTombstone(oldRe, entryVersion); > lruEntryDestroy(oldRe); > } > {code} > The proposed change is to remove this if statement and allow the old > tombstone to be collected later by calling {{scheduleTombstone()}} in all > cases. The call to {{AbstractRegionMap.removeTombstone()}} in > {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to > allow immediate removal of tombstones that we know have already been > collected on other members, but since the conditions to trigger it are rare > (the old entry must be a tombstone, the new entry received during GII must be > a tombstone with a newer version, and we must have already collected a > tombstone with a newer version than the new entry) and the overhead of > scheduling a tombstone to be collected is comparatively low, the performance > impact of removing this optimization in favour of simply scheduling the > tombstone to be collected in all cases should be insignificant. > The solution to the deadlock observed in GEODE-6526 was also to remove the > call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be > collected later and did not result in any unwanted behaviour, so the proposed > fix should be similarly low-impact. > Also of note is that with this proposed change, there will be no calls to > {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} > class, which should ensure that other deadlocks involving this method are not > possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8466) Create a ClassLoaderService to abstract away dealing with the default ClassLoader directly
[ https://issues.apache.org/jira/browse/GEODE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226452#comment-17226452 ] ASF GitHub Bot commented on GEODE-8466: --- lgtm-com[bot] commented on pull request #5658: URL: https://github.com/apache/geode/pull/5658#issuecomment-722075908 This pull request **introduces 3 alerts** and **fixes 1** when merging 19b1313d9d31dc3320e5659555649133e991db13 into 7367d17e3817fc41666d471c5eb4d0df0d33c18b - [view on LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-1bf6e8475bff02a5fe107ea9ef2ef16a37961649) **new alerts:** * 2 for Potential input resource leak * 1 for Use of a broken or risky cryptographic algorithm **fixed alerts:** * 1 for Use of a broken or risky cryptographic algorithm This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Create a ClassLoaderService to abstract away dealing with the default > ClassLoader directly > -- > > Key: GEODE-8466 > URL: https://issues.apache.org/jira/browse/GEODE-8466 > Project: Geode > Issue Type: New Feature > Components: core >Reporter: Udo Kohlmeyer >Assignee: Udo Kohlmeyer >Priority: Major > Labels: pull-request-available > > With the addition of ClassLoader isolation using JBoss Modules GEODE-8067, > the manner in which we interact with the ClassLoader needs to change. > An abstraction is required around the default functions like > `findResourceAsStream`, `loadClass` and `loadService`. > As these features will behave differently between different ClassLoader > implementations, it is best to have a single service that will expose that > functionality in a transparent manner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8626) Omitting field-mapping tag of cache.xml when using Simple JDBC Connector
[ https://issues.apache.org/jira/browse/GEODE-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226453#comment-17226453 ] ASF GitHub Bot commented on GEODE-8626: --- jchen21 commented on a change in pull request #5637: URL: https://github.com/apache/geode/pull/5637#discussion_r517719407 ## File path: geode-connectors/src/distributedTest/java/org/apache/geode/connectors/jdbc/internal/cli/CreateMappingCommandDUnitTest.java ## @@ -1142,7 +1142,7 @@ public void createMappingWithExistingQueueFails() { + " must not already exist."); } - private static class Employee implements PdxSerializable { + public static class Employee implements PdxSerializable { Review comment: Why this class has to be `public`? ## File path: geode-connectors/src/acceptanceTest/java/org/apache/geode/connectors/jdbc/CacheXmlJdbcMappingIntegrationTest.java ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.geode.connectors.jdbc; + +import static org.apache.geode.test.util.ResourceUtils.createTempFileFromResource; + +import org.junit.Rule; +import org.junit.contrib.java.lang.system.RestoreSystemProperties; + +import org.apache.geode.cache.CacheFactory; +import org.apache.geode.internal.cache.InternalCache; + +public class CacheXmlJdbcMappingIntegrationTest extends JdbcMappingIntegrationTest { + + @Rule + public RestoreSystemProperties restoreSystemProperties = new RestoreSystemProperties(); + + @Override + protected InternalCache createCacheAndCreateJdbcMapping(String cacheXmlTestName) + throws Exception { +String url = dbRule.getConnectionUrl().replaceAll("&", "&"); Review comment: Is this replacement of `&` necessary? ## File path: geode-connectors/src/main/java/org/apache/geode/connectors/jdbc/internal/JdbcConnectorServiceImpl.java ## @@ -210,4 +224,152 @@ private TableMetaDataView getTableMetaDataView(RegionMapping regionMapping, + regionMapping.getDataSourceName() + "\": ", ex); } } + + @Override + public TableMetaDataView getTableMetaDataView(RegionMapping regionMapping) { +DataSource dataSource = getDataSource(regionMapping.getDataSourceName()); +if (dataSource == null) { + throw new JdbcConnectorException("No datasource \"" + regionMapping.getDataSourceName() + + "\" found when getting table meta data \"" + regionMapping.getRegionName() + "\""); +} +return getTableMetaDataView(regionMapping, dataSource); + } + + @Override + public List createDefaultFieldMapping(RegionMapping regionMapping, + PdxType pdxType) { +DataSource dataSource = getDataSource(regionMapping.getDataSourceName()); +if (dataSource == null) { + throw new JdbcConnectorException("No datasource \"" + regionMapping.getDataSourceName() + + "\" found when creating mapping \"" + regionMapping.getRegionName() + "\""); Review comment: The data source has nothing to do with table metadata or region name. I recommend removing this line of error message. ## File path: geode-connectors/src/test/java/org/apache/geode/connectors/jdbc/internal/cli/CreateMappingPreconditionCheckFunctionTest.java ## @@ -172,16 +168,6 @@ public void executeFunctionThrowsIfDataSourceDoesNotExist() { + DATA_SOURCE_NAME + "'."); } - @Test Review comment: Why this test is removed? ## File path: geode-connectors/src/acceptanceTest/java/org/apache/geode/connectors/jdbc/GfshJdbcMappingIntegrationTest.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, so
[jira] [Created] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers
Jakov Varenina created GEODE-8687: - Summary: Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers Key: GEODE-8687 URL: https://issues.apache.org/jira/browse/GEODE-8687 Project: Geode Issue Type: Bug Components: client/server Affects Versions: 1.13.0 Reporter: Jakov Varenina When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Another problem arises because client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely (if there are lots of events that cannot be de-serialized) and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of problems since memory and disk usage (if overflow on queue is configured) will increase on all servers. You can find in attached logs for the problematic case with durable client : vm0 -> locator vm1, vm2 -> servers vm3 -> durable client with enabled subscription handling CQ events vm4 -> client generating traffic that should trigger registered CQ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers
[ https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-8687: -- Attachment: deserialzationFault.log > Durable client is continuously re-registering CQs on all servers when event > de-serialization fails causing resource exhaustion on servers > -- > > Key: GEODE-8687 > URL: https://issues.apache.org/jira/browse/GEODE-8687 > Project: Geode > Issue Type: Bug > Components: client/server >Affects Versions: 1.13.0 >Reporter: Jakov Varenina >Priority: Major > Attachments: deserialzationFault.log > > > When ReflectionBasedAutoSerializer is wrongly/not set it results with > serialization exception on client at the reception of the CQ events. > Serialization exception isn't logged which is misleading, and is hard to find > that actually ReflectionBasedAutoSerializer isn't set correctly. Only log > that can be seen is that client/servers subscription connections are closed > due to EOF. This is because client destroys subscriptions connections > intentionally, but doesn't log reason (PdxSerializationException) that led to > this. It would be good that serialization exceptions are logged as error or > warn. > Another problem arises because client destroys subscription connection and > perform server fail-over whenever serialization issue occurs. Additionally > when subscription connection for particular server fails multiple times then > this server is put in deny list for 10 seconds (this is configurable with > {{ping-interval}}). After 10s expire the server is removed from list and it > is available for subscription connection which again fail. This will go > indefinitely (if there are lots of events that cannot be de-serialized) and > approx. every 10s in this case the client subscribes to each servers at least > once. Due to serialization issue events aren't sent to client and remain in > subscription queues. > Whenever connection fails due to serialization issue and client is not > durable then subscription queue is closed and events are lost. > The biggest problem arises when client is durable. This is because > subscription queue remains on server for configurable period of time (e.g. > 300s) waiting for client to reconnect. When client perform fail-over to > another server it will create new subscription queue using initial image from > old queue that is currently paused. This means that all events from old queue > will be transferred to new subscription queue hosted by the current primary > server. This will happen on all servers and all of them will have copy of the > queue. The problem here is that client will periodically (every 10s in this > case) establish connection to each servers, so configured timeout (e.g. 300s) > will never expire, but it will be renewed each time client is registered. > This could cause a lots of problems since memory and disk usage (if overflow > on queue is configured) will increase on all servers. > You can find in attached logs for the problematic case with durable client : > vm0 -> locator > vm1, vm2 -> servers > vm3 -> durable client with enabled subscription handling CQ > events > vm4 -> client generating traffic that should trigger registered > CQ > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers
[ https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-8687: -- Description: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely (if there are lots of events that cannot be de-serialized) and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of problems since memory and disk usage (if overflow on queue is configured) will increase on all servers. You can find in attached logs for the problematic case with durable client : vm0 -> locator vm1, vm2 -> servers vm3 -> durable client with enabled subscription handling CQ events vm4 -> client generating traffic that should trigger registered CQ was: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Another problem arises because client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely (if there are lots of events that cannot be de-serialized) and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will b
[jira] [Updated] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers
[ https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-8687: -- Description: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of problems since memory and disk usage (if overflow on queue is configured) will increase on all servers. You can find in attached logs for the problematic case with durable client : vm0 -> locator vm1, vm2 -> servers vm3 -> durable client with enabled subscription handling CQ events vm4 -> client generating traffic that should trigger registered CQ was: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely (if there are lots of events that cannot be de-serialized) and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of problems since memory
[jira] [Updated] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers
[ https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-8687: -- Description: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue even subscription redundancy isn't configured. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of problems since memory and disk usage (if overflow on queue is configured) will increase on all servers. You can find in attached logs for the problematic case with durable client : vm0 -> locator vm1, vm2 -> servers vm3 -> durable client with enabled subscription handling CQ events vm4 -> client generating traffic that should trigger registered CQ was: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue even redundancy isn't configured. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of pro
[jira] [Updated] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers
[ https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-8687: -- Description: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue even redundancy isn't configured. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of problems since memory and disk usage (if overflow on queue is configured) will increase on all servers. You can find in attached logs for the problematic case with durable client : vm0 -> locator vm1, vm2 -> servers vm3 -> durable client with enabled subscription handling CQ events vm4 -> client generating traffic that should trigger registered CQ was: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of problems since memory and disk usage (if overflow
[jira] [Updated] (GEODE-8614) Provide an specific client-side exception for server LowMemoryException
[ https://issues.apache.org/jira/browse/GEODE-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mario Salazar de Torres updated GEODE-8614: --- Description: *AS AN* native client contributor *I WANT* to have a client-side exception for LowMemoryException *SO THAT* I can nofity accordingly from the client-side upon server memory-depletion. — *Additional information* This is the callstack of the LowMemoryException: {noformat} [error 2020/10/13 09:54:14.401405 UTC 140522117220352] Region::put: An exception (org.apache.geode.cache.LowMemoryException: PartitionedRegion: /part_a cannot process operation on key foo|0 because members [192.168.240.14(dms-server-1:1):41000] are running low on memory at org.apache.geode.internal.cache.partitioned.RegionAdvisor.checkIfBucketSick(RegionAdvisor.java:482) at org.apache.geode.internal.cache.PartitionedRegion.checkIfAboveThreshold(PartitionedRegion.java:2278) at org.apache.geode.internal.cache.PartitionedRegion.putInBucket(PartitionedRegion.java:2982) at org.apache.geode.internal.cache.PartitionedRegion.virtualPut(PartitionedRegion.java:2212) at org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:170) at org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5573) at org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5533) at org.apache.geode.internal.cache.LocalRegion.basicBridgePut(LocalRegion.java:5212) at org.apache.geode.internal.cache.tier.sockets.command.Put65.cmdExecute(Put65.java:411) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848) at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676) at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119) at java.base/java.lang.Thread.run(Thread.java:834) ) happened at remote server. {noformat} Idea would be to modify *ThinClientRegion::handleServerException* in order to return a new error and later on, map it to a new created exception *Suggestions* The new exception could be called: * CacheServerLowMemoryException * ... was: *AS AN* native client contributor *I WANT* to have a client-side exception for LowMemoryException *SO THAT* I can nofity accordingly from the client-side upon server memory-depletion. --- *Additional information* This is the callstack of the LowMemoryException: {noformat} [error 2020/10/13 09:54:14.401405 UTC 140522117220352] Region::put: An exception (org.apache.geode.cache.LowMemoryException: PartitionedRegion: /part_a cannot process operation on key foo|0 because members [192.168.240.14(dms-server-1:1):41000] are running low on memory at org.apache.geode.internal.cache.partitioned.RegionAdvisor.checkIfBucketSick(RegionAdvisor.java:482) at org.apache.geode.internal.cache.PartitionedRegion.checkIfAboveThreshold(PartitionedRegion.java:2278) at org.apache.geode.internal.cache.PartitionedRegion.putInBucket(PartitionedRegion.java:2982) at org.apache.geode.internal.cache.PartitionedRegion.virtualPut(PartitionedRegion.java:2212) at org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:170) at org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5573) at org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5533) at org.apache.geode.internal.cache.LocalRegion.basicBridgePut(LocalRegion.java:5212) at org.apache.geode.internal.cache.tier.sockets.command.Put65.cmdExecute(Put65.java:411) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848) at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676) at org.apache.geode.logging.internal.executo
[jira] [Updated] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers
[ https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-8687: -- Description: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which will be destroyed again due serialization issue. This will go indefinitely and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue even subscription redundancy isn't configured. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time client is registered. This could cause a lots of problems since memory and disk usage (if overflow on queue is configured) will increase on all servers. You can find in attached logs for the problematic case with durable client : vm0 -> locator vm1, vm2 -> servers vm3 -> durable client with enabled subscription handling CQ events vm4 -> client generating traffic that should trigger registered CQ was: When ReflectionBasedAutoSerializer is wrongly/not set it results with serialization exception on client at the reception of the CQ events. Serialization exception isn't logged which is misleading, and is hard to find that actually ReflectionBasedAutoSerializer isn't set correctly. Only log that can be seen is that client/servers subscription connections are closed due to EOF. This is because client destroys subscriptions connections intentionally, but doesn't log reason (PdxSerializationException) that led to this. It would be good that serialization exceptions are logged as error or warn. Client destroys subscription connection and perform server fail-over whenever serialization issue occurs. Additionally when subscription connection for particular server fails multiple times then this server is put in deny list for 10 seconds (this is configurable with {{ping-interval}}). After 10s expire the server is removed from list and it is available for subscription connection which again fail. This will go indefinitely and approx. every 10s in this case the client subscribes to each servers at least once. Due to serialization issue events aren't sent to client and remain in subscription queues. Whenever connection fails due to serialization issue and client is not durable then subscription queue is closed and events are lost. The biggest problem arises when client is durable. This is because subscription queue remains on server for configurable period of time (e.g. 300s) waiting for client to reconnect. When client perform fail-over to another server it will create new subscription queue using initial image from old queue that is currently paused. This means that all events from old queue will be transferred to new subscription queue hosted by the current primary server. This will happen on all servers and all of them will have copy of the queue even subscription redundancy isn't configured. The problem here is that client will periodically (every 10s in this case) establish connection to each servers, so configured timeout (e.g. 300s) will never expire, but it will be renewed each time c