[
https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242972#comment-17242972
]
ASF subversion and git services commented on GEODE-8687:
--------------------------------------------------------
Commit 2e7e45699b74812038dc516aaf7c14621951090b in geode's branch
refs/heads/develop from Jakov Varenina
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=2e7e456 ]
GEODE-8687: Fix for handling of PdxSerializationException on client (#5730)
* GEODE-8687: Improve handling of seralization error
* Improves handling of PdxSerializationException on client at the reception
of events from subscription queue
* Faulty behavior: At the reception of event for which
PdxSerializationException is thrown the client would always shutdown
CacheClientUpdater, destroy subscription queue connection
and try to perform failover to other server in cluster
* Behaviour with this fix: At the reception of event that provoke
PdxSerializationException client will only log the exception
* DurableClientCQAutoSerializer test updated
* Empty commit to trigger test
* Updates after review
> Durable client is continuously re-registering CQs on all servers when event
> de-serialization fails causing resource exhaustion on servers
> ------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-8687
> URL: https://issues.apache.org/jira/browse/GEODE-8687
> Project: Geode
> Issue Type: Bug
> Components: client/server
> Affects Versions: 1.13.0
> Reporter: Jakov Varenina
> Assignee: Jakov Varenina
> Priority: Major
> Labels: pull-request-available
> Attachments: deserialzationFault.log
>
>
> When ReflectionBasedAutoSerializer is wrongly/not set it results with
> serialization exception on client at the reception of the CQ events.
> Serialization exception isn't logged which is misleading, and is hard to find
> that actually ReflectionBasedAutoSerializer isn't set correctly. Only log
> that can be seen is that client/servers subscription connections are closed
> due to EOF. This is because client destroys subscriptions connections
> intentionally, but doesn't log reason (PdxSerializationException) that led to
> this. It would be good that serialization exceptions are logged as error or
> warn.
> Client destroys subscription connection and perform server fail-over whenever
> serialization issue occurs. Additionally when subscription connection for
> particular server fails multiple times then this server is put in deny list
> for 10 seconds (this is configurable with {{ping-interval}}). After 10s
> expire the server is removed from list and it is available for subscription
> connection which will be destroyed again due serialization issue. This will
> go indefinitely and approx. every 10s in this case the client subscribes to
> each servers at least once. Due to serialization issue events aren't sent to
> client and remain in subscription queues.
> Whenever connection fails due to serialization issue and client is not
> durable then subscription queue is closed and events are lost.
> The biggest problem arises when client is durable. This is because
> subscription queue remains on server for configurable period of time (e.g.
> 300s) waiting for client to reconnect. When client perform fail-over to
> another server it will create new subscription queue using initial image from
> old queue that is currently paused. This means that all events from old queue
> will be transferred to new subscription queue hosted by the current primary
> server. This will happen on all servers and all of them will have copy of the
> queue even subscription redundancy isn't configured. The problem here is that
> client will periodically (every 10s in this case) establish connection to
> each servers, so configured timeout (e.g. 300s) will never expire, but it
> will be renewed each time client is registered. This could cause a lots of
> problems since memory and disk usage (if overflow on queue is configured)
> will increase on all servers.
> You can find in attached logs for the problematic case with durable client :
> vm0 -> locator
> vm1, vm2 -> servers
> vm3 -> durable client with enabled subscription handling CQ
> events
> vm4 -> client generating traffic that should trigger registered
> CQ
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)