[ https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242971#comment-17242971 ]
ASF subversion and git services commented on GEODE-8687: -------------------------------------------------------- Commit 2e7e45699b74812038dc516aaf7c14621951090b in geode's branch refs/heads/develop from Jakov Varenina [ https://gitbox.apache.org/repos/asf?p=geode.git;h=2e7e456 ] GEODE-8687: Fix for handling of PdxSerializationException on client (#5730) * GEODE-8687: Improve handling of seralization error * Improves handling of PdxSerializationException on client at the reception of events from subscription queue * Faulty behavior: At the reception of event for which PdxSerializationException is thrown the client would always shutdown CacheClientUpdater, destroy subscription queue connection and try to perform failover to other server in cluster * Behaviour with this fix: At the reception of event that provoke PdxSerializationException client will only log the exception * DurableClientCQAutoSerializer test updated * Empty commit to trigger test * Updates after review > Durable client is continuously re-registering CQs on all servers when event > de-serialization fails causing resource exhaustion on servers > ------------------------------------------------------------------------------------------------------------------------------------------ > > Key: GEODE-8687 > URL: https://issues.apache.org/jira/browse/GEODE-8687 > Project: Geode > Issue Type: Bug > Components: client/server > Affects Versions: 1.13.0 > Reporter: Jakov Varenina > Assignee: Jakov Varenina > Priority: Major > Labels: pull-request-available > Attachments: deserialzationFault.log > > > When ReflectionBasedAutoSerializer is wrongly/not set it results with > serialization exception on client at the reception of the CQ events. > Serialization exception isn't logged which is misleading, and is hard to find > that actually ReflectionBasedAutoSerializer isn't set correctly. Only log > that can be seen is that client/servers subscription connections are closed > due to EOF. This is because client destroys subscriptions connections > intentionally, but doesn't log reason (PdxSerializationException) that led to > this. It would be good that serialization exceptions are logged as error or > warn. > Client destroys subscription connection and perform server fail-over whenever > serialization issue occurs. Additionally when subscription connection for > particular server fails multiple times then this server is put in deny list > for 10 seconds (this is configurable with {{ping-interval}}). After 10s > expire the server is removed from list and it is available for subscription > connection which will be destroyed again due serialization issue. This will > go indefinitely and approx. every 10s in this case the client subscribes to > each servers at least once. Due to serialization issue events aren't sent to > client and remain in subscription queues. > Whenever connection fails due to serialization issue and client is not > durable then subscription queue is closed and events are lost. > The biggest problem arises when client is durable. This is because > subscription queue remains on server for configurable period of time (e.g. > 300s) waiting for client to reconnect. When client perform fail-over to > another server it will create new subscription queue using initial image from > old queue that is currently paused. This means that all events from old queue > will be transferred to new subscription queue hosted by the current primary > server. This will happen on all servers and all of them will have copy of the > queue even subscription redundancy isn't configured. The problem here is that > client will periodically (every 10s in this case) establish connection to > each servers, so configured timeout (e.g. 300s) will never expire, but it > will be renewed each time client is registered. This could cause a lots of > problems since memory and disk usage (if overflow on queue is configured) > will increase on all servers. > You can find in attached logs for the problematic case with durable client : > vm0 -> locator > vm1, vm2 -> servers > vm3 -> durable client with enabled subscription handling CQ > events > vm4 -> client generating traffic that should trigger registered > CQ > -- This message was sent by Atlassian Jira (v8.3.4#803005)