Hi Eric, I agree that the only case in which the memory issue may occur is when all gateway senders instances are stopped. And that is what the solution proposed in the RFC is targeted at, and also that is why the stop gateway sender command is intended to be updated to fix the issue.
Note that while stopping all the gateway sender instances, there may be events stored in the secondary senders that will be dropped by the primary sender. Those dropped events need to be queued while the secondaries are still up so that when the sender is started again, the secondary's queues would be drained accordingly. If we go for the option of setting a limit on the dropped events, if set too small, there could be dropped events that should have been queued but weren't due to having reached the limit and which would not be sent to the secondaries to drain their queues completely (this is the case in which I meant that a notification must be sent to the operator of the system so that he knows that a possible issue is present in the system: queues with events that would stay there forever). On the other hand, if the limit is too high, the memory consumed by the queued dropped events could cause a problem of memory exhaustion. I think the right balance is to stop queueing dropped events when all the gateway sender instances are stopped. BR, Alberto ________________________________ From: Eric Shu <e...@vmware.com> Sent: Wednesday, July 8, 2020 9:25 PM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped I think the only case the memory issue occurred is when all gateway senders are stopped in the wan-site. Otherwise another member would assume to be the primary queue. No more events will be enqueued in tmpDroppedEvents on the member with original primary queue. (For parallel wan queue, I do not think stop one gateway queue is a valid case to support.) For all gateway senders are stopped case, no need to notify any other members in the wan site if the limit is reached. The tmpDroppedEvents is only used for remove events on the secondary queue. If no events are enqueued in the secondary queue, there is no need to add into tmpDroppedEvents at all. To me, it should be only used for limited events to be queued. Regards, Eric ________________________________ From: Alberto Gomez <alberto.go...@est.tech> Sent: Wednesday, July 8, 2020 12:02 PM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped Thanks for your comments, Eric. Limiting the size of the queue would be a simple solution but I think it would pose several problems on the the one configuring and operating Geode: * How big should the queue be? Probably not easy to dimension. Should the limit by on the memory occupied by the elements or on the number of elements in the queue (in which case, depending on the size of the elements, the memory used could vary a lot)? * What to do when the limit has been reached? how do we notify that it was reached, what to do afterwards, how would we know what dropped events did not make it to the queue but should have been removed from the secondary's queue... I think the solution proposed in the RFC is simple enough and also addresses a possible confusion with the semantics of the gateway sender stop command. Stopping a gateway sender currently makes that all events received while the sender is stopped are dropped; but at the same time, unlimited memory may be consumed by the dropped events. We could put a limit on the amount of memory used by the queued dropped events but what would be the point in the first place to store them if those events will not be sent to the remote site anyway? I would expect that after stopping a gateway sender no resources (or at least a minimal part) would be consumed by it. Otherwise we may as well not stop it or use the pause command depending on what we want to achieve. >From what I have seen, queuing dropped events has its place while the gateway >sender is starting and while it is stopping but if it is done in a sender to >be started manually or in a manually stopped server it could provoke an >unexpected memory exhaustion. I really think the solution proposed makes the behavior of the gateway sender command more logical. Best regards, Alberto ________________________________ From: Eric Shu <e...@vmware.com> Sent: Wednesday, July 8, 2020 7:32 PM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped It seems that I was not able to comment on the RFC in the wiki yet. Just try to find out if we have a simple solution for the issue you raised -- can we have a up-limit for the tmpDroppedEvents queue in question? Always check the limit before adding to the queue -- so that the tmp queue is not unbound? Regards, Eric ________________________________ From: Alberto Gomez <alberto.go...@est.tech> Sent: Monday, July 6, 2020 8:24 AM To: geode <dev@geode.apache.org> Subject: [DISCUSS] RFC - Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped Hi, I have published a new RFC in the Apache Geode wiki with the following title: "Avoid the queueing of dropped events by the primary gateway sender when the gateway sender is stopped". https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FAvoid%2Bthe%2Bqueuing%2Bof%2Bdropped%2Bevents%2Bby%2Bthe%2Bprimary%2Bgateway%2Bsender%2Bwhen%2Bthe%2Bgateway%2Bsender%2Bis%2Bstopped&data=02%7C01%7Ceshu%40vmware.com%7C82aeb2f0bd30435131bd08d8237173c3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637298317468898044&sdata=ihK%2BeTvnhiA0XXcw22fv5VjjgzjYL2EQwL5%2Fe0KK%2F08%3D&reserved=0 Could you please give comments by Thursday, July 9th, 2020? Thanks in advance, Alberto G.