[ https://issues.apache.org/jira/browse/GEODE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904880#comment-17904880 ]
Leon Finker commented on GEODE-10453: ------------------------------------- It's a bug specific to compact index logic. When switching to asynchronous index (non compact), this issue doesn't happen. It also happens on server cache startup and index creation over existing data that is received as initial snapshot from other peer. And it's not really possible to work around when using overflow to disk regions because those do not support non compact indexes. {noformat} [warn <ThreadsMonitor> tid=55] Thread <77> (0x4d) that was executed at <07 Dec 2024 12:47:28 EST> has been stuck for <994.887 seconds> and number of thread monitor iteration <17> Thread Name <Pooled High Priority Message Processor 3> state <RUNNABLE> Executor Group <PooledExecutorWithDMStats> Monitored metric <ResourceManagerStats.numThreadsStuck> Thread stack for "Pooled High Priority Message Processor 3" (0x4d): java.lang.ThreadState: RUNNABLE at java.base@17.0.6/java.lang.Throwable.fillInStackTrace(Native Method) at java.base@17.0.6/java.lang.Throwable.fillInStackTrace(Throwable.java:798) at java.base@17.0.6/java.lang.Throwable.<init>(Throwable.java:271) at java.base@17.0.6/java.lang.Exception.<init>(Exception.java:67) at java.base@17.0.6/java.lang.RuntimeException.<init>(RuntimeException.java:63) at java.base@17.0.6/java.lang.ClassCastException.<init>(ClassCastException.java:57) at java.base@17.0.6/java.lang.String.compareTo(String.java:140) at app//org.apache.geode.cache.query.internal.types.TypeUtils$ComparisonStrategy$4.execute(TypeUtils.java:90) at app//org.apache.geode.cache.query.internal.types.TypeUtils.compare(TypeUtils.java:499) at app//org.apache.geode.cache.query.internal.index.MemoryIndexStore.getOldKey(MemoryIndexStore.java:275) at app//org.apache.geode.cache.query.internal.index.MemoryIndexStore.basicRemoveMapping(MemoryIndexStore.java:399) at app//org.apache.geode.cache.query.internal.index.MemoryIndexStore.removeMapping(MemoryIndexStore.java:298) at app//org.apache.geode.cache.query.internal.index.CompactRangeIndex.removeMapping(CompactRangeIndex.java:173) at app//org.apache.geode.cache.query.internal.index.AbstractIndex.removeIndexMapping(AbstractIndex.java:508) at app//org.apache.geode.cache.query.internal.index.IndexManager.removeIndexMapping(IndexManager.java:1156) at app//org.apache.geode.cache.query.internal.index.IndexManager.processAction(IndexManager.java:1121) at app//org.apache.geode.cache.query.internal.index.IndexManager.updateIndexes(IndexManager.java:982) at app//org.apache.geode.cache.query.internal.index.IndexManager.updateIndexes(IndexManager.java:956) at app//org.apache.geode.internal.cache.AbstractRegionMap.initialImagePut(AbstractRegionMap.java:836) at app//org.apache.geode.internal.cache.InitialImageOperation.processChunk(InitialImageOperation.java:980) at app//org.apache.geode.internal.cache.InitialImageOperation$ImageProcessor.process(InitialImageOperation.java:1306) at app//org.apache.geode.distributed.internal.ReplyMessage.process(ReplyMessage.java:215) at app//org.apache.geode.internal.cache.InitialImageOperation$ImageReplyMessage.process(InitialImageOperation.java:2829) at app//org.apache.geode.distributed.internal.ReplyMessage.dmProcess(ReplyMessage.java:198) at app//org.apache.geode.distributed.internal.ReplyMessage.process(ReplyMessage.java:191) at app//org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:380) at app//org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:445) at java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at app//org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:449) at app//org.apache.geode.distributed.internal.ClusterOperationExecutors.doHighPriorityThread(ClusterOperationExecutors.java:407) at app//org.apache.geode.distributed.internal.ClusterOperationExecutors$$Lambda$312/0x00000008018c1e50.invoke(Unknown Source) at app//org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120) at app//org.apache.geode.logging.internal.executors.LoggingThreadFactory$$Lambda$310/0x00000008018c1780.run(Unknown Source) at java.base@17.0.6/java.lang.Thread.run(Thread.java:833) Locked ownable synchronizers: - None {noformat} > Infinite/slow indexing on reconnect and register interest replay > ---------------------------------------------------------------- > > Key: GEODE-10453 > URL: https://issues.apache.org/jira/browse/GEODE-10453 > Project: Geode > Issue Type: Bug > Affects Versions: 1.15.1 > Reporter: Leon Finker > Priority: Major > > Cache server was restarted. Client side upon reconnect went into > infinite/slow indexing loop. This has not recovered even after multiple days. > The thread stack for thread taking 100% CPU was: > {code} > Thread Name <poolTimer-Server-21659> state <BLOCKED> > Waiting on <org.apache.geode.cache.client.internal.ConnectionImpl@293d172> > Owned By <queueTimer-Server1> with ID <140> > Executor Group <ScheduledThreadPoolExecutorWithKeepAlive> > Monitored metric <ResourceManagerStats.numThreadsStuck> > Thread stack for "poolTimer-Server-21659" (0x128275): > java.lang.ThreadState: BLOCKED > at > app//org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:283) > at > app//org.apache.geode.cache.client.internal.QueueConnectionImpl.execute(QueueConnectionImpl.java:191) > at > app//org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:760) > at > app//org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:343) > at > app//org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:312) > at > app//org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:848) > at > app//org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:40) > at > app//org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:128) > at > app//org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1340) > at > java.base@17.0.6/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) > at > java.base@17.0.6/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > app//org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:285) > at > java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base@17.0.6/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base@17.0.6/java.lang.Thread.run(Thread.java:833) > Locked ownable synchronizers: > - None > Lock owner thread stack for "queueTimer-Server1" (0x6a): > java.lang.ThreadState: RUNNABLE > at > app//org.apache.geode.cache.query.internal.types.TypeUtils$ComparisonStrategy$4.execute(TypeUtils.java:90) > at > app//org.apache.geode.cache.query.internal.types.TypeUtils.compare(TypeUtils.java:499) > at > app//org.apache.geode.cache.query.internal.index.MemoryIndexStore.getOldKey(MemoryIndexStore.java:275) > at > app//org.apache.geode.cache.query.internal.index.MemoryIndexStore.updateMapping(MemoryIndexStore.java:122) > at > app//org.apache.geode.cache.query.internal.index.CompactRangeIndex$IMQEvaluator.applyProjection(CompactRangeIndex.java:1563) > at > app//org.apache.geode.cache.query.internal.index.CompactRangeIndex$IMQEvaluator.doNestedIterations(CompactRangeIndex.java:1519) > at > app//org.apache.geode.cache.query.internal.index.CompactRangeIndex$IMQEvaluator.evaluate(CompactRangeIndex.java:1372) > at > app//org.apache.geode.cache.query.internal.index.CompactRangeIndex.addMapping(CompactRangeIndex.java:143) > at > app//org.apache.geode.cache.query.internal.index.AbstractIndex.addIndexMapping(AbstractIndex.java:488) > at > app//org.apache.geode.cache.query.internal.index.IndexManager.addIndexMapping(IndexManager.java:1143) > at > app//org.apache.geode.cache.query.internal.index.IndexManager.processAction(IndexManager.java:1089) > at > app//org.apache.geode.cache.query.internal.index.IndexManager.updateIndexes(IndexManager.java:982) > at > app//org.apache.geode.cache.query.internal.index.IndexManager.updateIndexes(IndexManager.java:956) > at > app//org.apache.geode.internal.cache.AbstractRegionMap.initialImagePut(AbstractRegionMap.java:839) > at > app//org.apache.geode.internal.cache.LocalRegion.refreshEntriesFromServerKeys(LocalRegion.java:4348) > at > app//org.apache.geode.cache.client.internal.RegisterInterestOp$RegisterInterestOpImpl.processResponse(RegisterInterestOp.java:217) > at > app//org.apache.geode.cache.client.internal.RegisterInterestOp$RegisterInterestOpImpl.processResponse(RegisterInterestOp.java:121) > at > app//org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:209) > at > app//org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:394) > at > app//org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:284) > at > app//org.apache.geode.cache.client.internal.QueueConnectionImpl.execute(QueueConnectionImpl.java:191) > at > app//org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:760) > at > app//org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:475) > at > app//org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:488) > at > app//org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:861) > at > app//org.apache.geode.cache.client.internal.RegisterInterestOp.executeOn(RegisterInterestOp.java:113) > at > app//org.apache.geode.cache.client.internal.ServerRegionProxy.registerInterestOn(ServerRegionProxy.java:506) > at > app//org.apache.geode.cache.client.internal.QueueManagerImpl.recoverSingleKey(QueueManagerImpl.java:1236) > at > app//org.apache.geode.cache.client.internal.QueueManagerImpl.recoverSingleRegion(QueueManagerImpl.java:1183) > at > app//org.apache.geode.cache.client.internal.QueueManagerImpl.recoverSingleList(QueueManagerImpl.java:1129) > at > app//org.apache.geode.cache.client.internal.QueueManagerImpl.recoverInterestList(QueueManagerImpl.java:1250) > at > app//org.apache.geode.cache.client.internal.QueueManagerImpl.recoverAllInterestTypes(QueueManagerImpl.java:1264) > at > app//org.apache.geode.cache.client.internal.QueueManagerImpl.recoverInterest(QueueManagerImpl.java:1094) > at > app//org.apache.geode.cache.client.internal.QueueManagerImpl.recoverPrimary(QueueManagerImpl.java:938) > at > app//org.apache.geode.cache.client.internal.QueueManagerImpl$RedundancySatisfierTask.run2(QueueManagerImpl.java:1475) > at > app//org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1340) > at > java.base@17.0.6/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) > at java.base@17.0.6/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base@17.0.6/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > app//org.apache.geode.cache.query.internal.index.CompactRangeIndex$IMQEvaluator.doNestedIterations(CompactRangeIndex.java:1509) > {code} > After client stop attempt and cache close, the following stack trace was > logged: > {code} > The index is corrupted and > marked as invalid. > org.apache.geode.cache.CacheClosedException: The cache is closed. > at > org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5207) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.internal.cache.LocalRegion.checkRegionDestroyed(LocalRegion.java:7382) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.internal.cache.LocalRegion.checkReadiness(LocalRegion.java:2788) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.internal.cache.LocalRegion.values(LocalRegion.java:1970) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.QRegion.<init>(QRegion.java:81) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.index.DummyQRegion.<init>(DummyQRegion.java:52) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.index.CompactRangeIndex$IMQEvaluator.evaluate(CompactRangeIndex.java:1342) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.index.CompactRangeIndex.addMapping(CompactRangeIndex.java:143) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.index.AbstractIndex.addIndexMapping(AbstractIndex.java:488) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.index.IndexManager.addIndexMapping(IndexManager.java:1143) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.index.IndexManager.processAction(IndexManager.java:1089) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.index.IndexManager.updateIndexes(IndexManager.java:982) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.query.internal.index.IndexManager.updateIndexes(IndexManager.java:956) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.internal.cache.AbstractRegionMap.initialImagePut(AbstractRegionMap.java:839) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.internal.cache.LocalRegion.refreshEntriesFromServerKeys(LocalRegion.java:4348) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.RegisterInterestOp$RegisterInterestOpImpl.processResponse(RegisterInterestOp.java:217) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.RegisterInterestOp$RegisterInterestOpImpl.processResponse(RegisterInterestOp.java:121) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:209) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:394) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:284) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueConnectionImpl.execute(QueueConnectionImpl.java:191) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:760) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:475) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:488) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:861) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.RegisterInterestOp.executeOn(RegisterInterestOp.java:113) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.ServerRegionProxy.registerInterestOn(ServerRegionProxy.java:506) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueManagerImpl.recoverSingleKey(QueueManagerImpl.java:1236) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueManagerImpl.recoverSingleRegion(QueueManagerImpl.java:1183) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueManagerImpl.recoverSingleList(QueueManagerImpl.java:1129) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueManagerImpl.recoverInterestList(QueueManagerImpl.java:1250) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueManagerImpl.recoverAllInterestTypes(QueueManagerImpl.java:1264) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueManagerImpl.recoverInterest(QueueManagerImpl.java:1094) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueManagerImpl.recoverPrimary(QueueManagerImpl.java:938) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.QueueManagerImpl$RedundancySatisfierTask.run2(QueueManagerImpl.java:1475) > ~[geode-core-1.15.1.jar:?] > at > org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1340) > ~[geode-core-1.15.1.jar:?] > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) > ~[?:?] > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > ~[?:?] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)