In order to better understand this request: Is this an existing issue?
Why is it more critical to squeeze it into an existing (almost release) version of Apache Geode?
What guarantees do we have that this fix makes the application more stable compared to adding another hidden issue, which we will discover in another few weeks from now?
--Udo On 8/26/19 3:10 PM, Ryan McMahon wrote:
Hi all, I would like to propose cherry-picking GEODE-7088 and GEODE-7089 to the 1.10.0 release branch. The two JIRAs are related to the same root problem, which I would classify as critical. We discovered a case where a failed client registration could lead to a memory leak in a server, eventually causing the server to crash due to lack of memory. The issue is instigated by a ConcurrentModificationException due to iteration of a non-thread safe collection while it is being mutated (GEODE-7088). This exception occurs when the client's queue image is being copied from one server to the next during client registration, and it causes the client's registration to fail. The client would likely succeed if it retried registration with that same server, but if it registers with a different server, we end up leaking events to the client's registration queue on the original server (GEODE-7089). The fix for GEODE-7088 is to use thread-safe collections for interested clients in client update messages. The fix for GEODE-7089 is to always drain and remove the registration queue regardless of success or failure. Together, these fixes prevent the failed registrations and memory leak. The SHAs for the fixes and tests in develop are: GEODE-7088 - 174af1d23fb7e09eb2bc2fa55479df854850fadb - 5bb753a8f4ff2886acd8e62d6f51fea58e37881d GEODE-7089 - 5d0153ad4adb1612a1083673f98b1982819a6589 This proposal is to cherry-pick these commits to 1.10.0 release branch. Thanks, Ryan McMahon