[jira] [Updated] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers

Jakov Varenina (Jira) Wed, 04 Nov 2020 21:55:19 -0800


     [ 
https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jakov Varenina updated GEODE-8687:
----------------------------------
    Attachment: deserialzationFault.log

> Durable client is continuously re-registering CQs on all servers when event 
> de-serialization fails causing resource exhaustion on servers 
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-8687
>                 URL: https://issues.apache.org/jira/browse/GEODE-8687
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server
>    Affects Versions: 1.13.0
>            Reporter: Jakov Varenina
>            Priority: Major
>         Attachments: deserialzationFault.log
>
>
> When ReflectionBasedAutoSerializer is wrongly/not set it results with 
> serialization exception on client at the reception of the CQ events. 
> Serialization exception isn't logged which is misleading, and is hard to find 
> that actually ReflectionBasedAutoSerializer isn't set correctly. Only log 
> that can be seen is that client/servers subscription connections are closed 
> due to EOF. This is because client destroys subscriptions connections 
> intentionally, but doesn't log reason (PdxSerializationException) that led to 
> this. It would be good that serialization exceptions are logged as error or 
> warn.
> Another problem arises because client destroys subscription connection and 
> perform server fail-over whenever serialization issue occurs. Additionally 
> when subscription connection for particular server fails multiple times then 
> this server is put in deny list for 10 seconds (this is configurable with 
> {{ping-interval}}). After 10s expire the server is removed from list and it 
> is available for subscription connection which again fail. This will go 
> indefinitely (if there are lots of events that cannot be de-serialized) and 
> approx. every 10s in this case the client subscribes to each servers at least 
> once. Due to serialization issue events aren't sent to client and remain in 
> subscription queues.
> Whenever connection fails due to serialization issue and client is not 
> durable then subscription queue is closed and events are lost.
> The biggest problem arises when client is durable. This is because 
> subscription queue remains on server for configurable period of time (e.g. 
> 300s) waiting for client to reconnect. When client perform fail-over to 
> another server it will create new subscription queue using initial image from 
> old queue that is currently paused. This means that all events from old queue 
> will be transferred to new subscription queue hosted by the current primary 
> server. This will happen on all servers and all of them will have copy of the 
> queue. The problem here is that client will periodically (every 10s in this 
> case) establish connection to each servers, so configured timeout (e.g. 300s) 
> will never expire, but it will be renewed each time client is registered. 
> This could cause a lots of problems since memory and disk usage (if overflow 
> on queue is configured) will increase on all servers.
> You can find in attached logs for the problematic case with durable client :
> vm0              -> locator
> vm1, vm2   -> servers
> vm3              -> durable client with enabled subscription handling CQ 
> events
> vm4              -> client generating traffic that should trigger registered 
> CQ
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers

Reply via email to