[
https://issues.apache.org/jira/browse/GEODE-10395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated GEODE-10395:
-----------------------------------
Labels: pull-request-available (was: )
> TXLockIdImpl memory leak after CommitConflictException from another node
> ------------------------------------------------------------------------
>
> Key: GEODE-10395
> URL: https://issues.apache.org/jira/browse/GEODE-10395
> Project: Geode
> Issue Type: Bug
> Affects Versions: 1.14.0, 1.15.0
> Reporter: Eugene Nedzvetsky
> Assignee: Mario Kevo
> Priority: Major
> Labels: pull-request-available
>
> org.apache.geode.internal.cache.locks.TXLockServiceImpl#txLock:120 adds
> TXLockIdImpl objects to TXLockServiceImpl#txLockIdList.
> {code:java}
> synchronized (txLockIdList) {
> txLockId = new TXLockIdImpl(dlock.getDistributionManager().getId());
> txLockIdList.add(txLockId);
> }
> {code}
> These objects will not be removed from this list if dlock.acquireTryLocks
> returned false.
> {code:java}
> gotLocks = dlock.acquireTryLocks(batch, TIMEOUT_MILLIS, LEASE_MILLIS,
> keyIfFail);
> if (gotLocks) { // ...otherwise race can occur between tryLocks and
> readLock
> acquireRecoveryReadLock();
> } else if (keyIfFail[0] != null) {
> throw new CommitConflictException(
> String.format("Concurrent transaction commit detected %s",
> keyIfFail[0]));
> } else {
> throw new CommitConflictException(
> String.format("Failed to request try locks from grantor: %s",
> dlock.getLockGrantorId()));
> }
> {code}
> It throws CommitConflictException and after that system doesn't have a
> txLockId reference and this txLockId will be never removed from this list.
> It produces critical performance degradation. txLockIdList has a few hundred
> thousand txLocks after a few weeks and TXLockServiceImpl#release iterates
> this list 2 times on every tx commit and the same time "synchronized
> (txLockIdList)" locks other threads.
> TXLockIdImpl#equals works really slow because it checks bunch of variables in
> memberId.equals(that.memberId).
> {code:java}
> public void release(TXLockId txLockId) {
> synchronized (txLockIdList) {
> if (!txLockIdList.contains(txLockId)) {
> // TXLockService.destroyServices can be invoked in cache.close().
> // Other P2P threads could process message such as TXCommitMessage
> afterwards,
> // and invoke TXLockService.createDTLS(). It could create a new
> TXLockService
> // which will have a new empty list (txLockIdList) and it will not
> // contain the originally added txLockId
> throw new IllegalArgumentException(
> String.format("Invalid txLockId not found: %s",
> txLockId));
> }
> dlock.releaseTryLocks(txLockId, () -> {
> return recovering;
> });
> txLockIdList.remove(txLockId);
> releaseRecoveryReadLock();
> }
> }
> {code}
> I think TXLockServiceImpl#txLock should remove this txLockId from
> TXLockServiceImpl#txLockIdList in case of CommitConflictException:
> {code:java}
> if (gotLocks) { // ...otherwise race can occur between tryLocks and readLock
> acquireRecoveryReadLock();
> } else if (keyIfFail[0] != null) {
> synchronized (this.txLockIdList) {
> this.txLockIdList.remove(txLockId);
> }
> throw new CommitConflictException(
> String.format("Concurrent transaction commit detected
> %s",
> keyIfFail[0]));
> } else {
> synchronized (this.txLockIdList) {
> this.txLockIdList.remove(txLockId);
> }
> throw new CommitConflictException(
> String.format("Failed to request try locks from
> grantor: %s",
> this.dlock.getLockGrantorId()));
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)