Hi all, We are running Solr 8.2 Cloud in a cluster where we have a single TLOG replica per shard and multiple PULL replicas for each shard. We have noticed an issue recently where some of the PULL replicas stop replicating from the masters. The will have a replication which outputs:
o.a.s.h.IndexFetcher Number of files in latest index in master: Then nothing else for IndexFetcher after that. I went onto a few instances and took a thread dump and we see the following where it seems to be locked getting the index write lock. I don’t see anything else in the thread dump indicating deadlock. Any ideas here? "indexFetcher-19-thread-1" #468 prio=5 os_prio=0 cpu=285847.01ms > elapsed=62993.13s tid=0x00007fa8fc004800 nid=0x254 waiting on condition > [0x00007ef584ede000] > java.lang.Thread.State: TIMED_WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > - parking to wait for <0x00000003aa5e4ad8> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6 > /LockSupport.java:234) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(java.base@11.0.6 > /AbstractQueuedSynchronizer.java:980) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(java.base@11.0.6 > /AbstractQueuedSynchronizer.java:1288) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(java.base@11.0.6 > /ReentrantReadWriteLock.java:1131) > at > org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179) > at > org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424) > at > org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1193) > at > org.apache.solr.handler.ReplicationHandler$$Lambda$668/0x0000000800d0f440.run(Unknown > Source) > at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.6 > /Executors.java:515) > at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.6 > /FutureTask.java:305) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.6 > /ScheduledThreadPoolExecutor.java:305) > at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6 > /ThreadPoolExecutor.java:1128) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6 > /ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)