I would have expected unsent events to be stored in a queue that is backed by a persistent region or something on disk. If that's not currently true, then it seems like a good direction might be to make tmpDroppedEvents use a durable queue of some sort that overflows to disk.
On Thu, Jul 2, 2020 at 10:33 AM Alberto Gomez <alberto.go...@est.tech> wrote: > Hi, > > We have observed that when a gateway sender is stopped in a site, all the > events received while it is stopped are stored in the > 'AbstractGatewaySender.tmpDroppedEvents' queue of the primary sender. The > elements of this queue are not removed from this queue until the sender is > started back again. > > This behavior implies that if the gateway sender is stopped for a long > time, there is a risk of heap exhaustion in the members hosting primary > senders. > > Under split brain situations, if lasting long enough, there could be heap > exhaustion problems in servers due to the memory used by the gateway sender > queues, even if overflow to disk is used -given that part of the event is > always stored in memory. > For those situations we had thought about stopping gateway senders when > the memory used by the gateway sender queues reached a certain memory > threshold. But according to the above, stopping the gateway senders would > only make things worse. > > Would it make sense for the gateway sender not to store the received > events in tmpDroppedEvents while it is stopped? > > Any suggestion on how to approach the problem of heap exhaustion due to > the growth of gateway sender queues in long lasting split brain situations? > > Thanks in advance, > > Alberto G. > > >