[ https://issues.apache.org/jira/browse/SOLR-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris M. Hostetter updated SOLR-14431: -------------------------------------- Component/s: SolrCloud replication (java) Description: A bug in the {{SegmentsInfoRequestHandler}} (aka: {{/admin/segments}} - which is used under the covers when viewing the "Segments Info" panel of a core in the Admin UI) causes it to increment the internal "ref-count" of the IndexWriter by default, with out ever decrementing that ref-count. This can cause delayed problems in any situation where the IndexWriter needs updated/replaced/locked: * Core {{RELOAD}} operations * Master/Slave replication (via IndexFetcher) * {{PULL}} Replica updates (via IndexFetcher) * {{TLOG}} Replica updates (via IndexFetcher) * {{NRT}} Recovery from Leader (via IndexFetcher) ...these manifest as operations that "stall" due to the threads attempting to execute them blocking forever waiting for a {{ReentrantReadWriteLock}} in {{DefaultSolrCoreState}} that will never be released. A config only workaround exists for this problem, by explicitly declaring the {{/admin/segments}} handler in {{solrconfig.xml}} with an {{invariants}} param that requests additional info, forcing it down a code path where it _uses_ the IndexWriter, *and decrements the ref-count, releasing the lock*. {code:java|title=solrconfig.xml workaround} <requestHandler name="/admin/segments" class="solr.SegmentsInfoRequestHandler"> <!-- work around for https://issues.apache.org/jira/browse/SOLR-14431 --> <lst name="invariants"> <bool name="coreInfo">true</bool> </lst> </requestHandler> {code} ---- Example stack traces of what this can look like {noformat:title=IndexFetcher example stalled thread"} "thread",{ "id":65, "name":"indexFetcher-19-thread-1", "state":"TIMED_WAITING", "lock":"java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@22a18ed", "cpuTime":"1454860.0285ms", "userTime":"622230.0000ms", "stackTrace":["java.base@11.0.7/jdk.internal.misc.Unsafe.park(Native Method)", "java.base@11.0.7/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)", "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:980)", "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)", "java.base@11.0.7/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1131)", "org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)", "org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240)", "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)", "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)", "org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)", "org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1210)", "org.apache.solr.handler.ReplicationHandler$$Lambda$513/0x00000008006bf440.run(Unknown Source)", "java.base@11.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)", "java.base@11.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)", "java.base@11.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)", "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)", "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)", "java.base@11.0.7/java.lang.Thread.run(Thread.java:834)"]}, {noformat} {noformat:title=Core RELOAD example stalled thread} "thread",{ "id":16, "name":"qtp1558079303-16", "state":"WAITING", "lock":"java.lang.Object@70c81fe1", "cpuTime":"73.4453ms", "userTime":"60.0000ms", "stackTrace":["java.base@11.0.4/java.lang.Object.wait(Native Method)", "java.base@11.0.4/java.lang.Object.wait(Object.java:328)", "org.apache.solr.core.SolrCores.waitAddPendingCoreOps(SolrCores.java:394)", "org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1545)", "org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:132)", "org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$266/0x0000000100431040.execute(Unknown Source)", "org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:362)", "org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)", "org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)", "org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)", "org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:839)", "org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:805)", "org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:558)", "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)", "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)", "org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)", ... {noformat} {panel:title="Original Jira Description"} If withCoreInfo is false iwRef.decref() will not be called to release the reader lock, preventing any further writer locks. [https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144] Line 130 should be moved inside the if statement L144. [~ab] FYI {panel} was: If withCoreInfo is false iwRef.decref() will not be called to release the reader lock, preventing any further writer locks. https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144 Line 130 should be moved inside the if statement L144. [~ab] FYI Priority: Critical (was: Minor) Summary: Using "Segments Info" UI screen can cause future stalls in replication/recovery/core-reload (/admin/segments) (was: SegmentsInfoRequestHandler.java does not release IndexWriter) updated jira summary, description, and metadata to make it more clear how serious this issue is, and to point out a config only work around. > Using "Segments Info" UI screen can cause future stalls in > replication/recovery/core-reload (/admin/segments) > ------------------------------------------------------------------------------------------------------------- > > Key: SOLR-14431 > URL: https://issues.apache.org/jira/browse/SOLR-14431 > Project: Solr > Issue Type: Bug > Components: Admin UI, replication (java), SolrCloud > Affects Versions: 8.1.1, 8.5.1 > Reporter: Tiziano Degaetano > Assignee: Andrzej Bialecki > Priority: Critical > Fix For: 8.6 > > > A bug in the {{SegmentsInfoRequestHandler}} (aka: {{/admin/segments}} - which > is used under the covers when viewing the "Segments Info" panel of a core in > the Admin UI) causes it to increment the internal "ref-count" of the > IndexWriter by default, with out ever decrementing that ref-count. > This can cause delayed problems in any situation where the IndexWriter needs > updated/replaced/locked: > * Core {{RELOAD}} operations > * Master/Slave replication (via IndexFetcher) > * {{PULL}} Replica updates (via IndexFetcher) > * {{TLOG}} Replica updates (via IndexFetcher) > * {{NRT}} Recovery from Leader (via IndexFetcher) > ...these manifest as operations that "stall" due to the threads attempting to > execute them blocking forever waiting for a {{ReentrantReadWriteLock}} in > {{DefaultSolrCoreState}} that will never be released. > A config only workaround exists for this problem, by explicitly declaring the > {{/admin/segments}} handler in {{solrconfig.xml}} with an {{invariants}} > param that requests additional info, forcing it down a code path where it > _uses_ the IndexWriter, *and decrements the ref-count, releasing the lock*. > {code:java|title=solrconfig.xml workaround} > <requestHandler name="/admin/segments" > class="solr.SegmentsInfoRequestHandler"> > <!-- work around for https://issues.apache.org/jira/browse/SOLR-14431 --> > <lst name="invariants"> > <bool name="coreInfo">true</bool> > </lst> > </requestHandler> > {code} > ---- > Example stack traces of what this can look like > {noformat:title=IndexFetcher example stalled thread"} > "thread",{ > "id":65, > "name":"indexFetcher-19-thread-1", > "state":"TIMED_WAITING", > > "lock":"java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@22a18ed", > "cpuTime":"1454860.0285ms", > "userTime":"622230.0000ms", > "stackTrace":["java.base@11.0.7/jdk.internal.misc.Unsafe.park(Native > Method)", > > "java.base@11.0.7/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)", > > "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:980)", > > "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)", > > "java.base@11.0.7/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1131)", > > "org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)", > > "org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240)", > > "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)", > > "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)", > > "org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)", > > "org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1210)", > > "org.apache.solr.handler.ReplicationHandler$$Lambda$513/0x00000008006bf440.run(Unknown > Source)", > > "java.base@11.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)", > > "java.base@11.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)", > > "java.base@11.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)", > > "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)", > > "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)", > "java.base@11.0.7/java.lang.Thread.run(Thread.java:834)"]}, > {noformat} > {noformat:title=Core RELOAD example stalled thread} > "thread",{ > "id":16, > "name":"qtp1558079303-16", > "state":"WAITING", > "lock":"java.lang.Object@70c81fe1", > "cpuTime":"73.4453ms", > "userTime":"60.0000ms", > "stackTrace":["java.base@11.0.4/java.lang.Object.wait(Native Method)", > "java.base@11.0.4/java.lang.Object.wait(Object.java:328)", > > "org.apache.solr.core.SolrCores.waitAddPendingCoreOps(SolrCores.java:394)", > > "org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1545)", > > "org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:132)", > > "org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$266/0x0000000100431040.execute(Unknown > Source)", > > "org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:362)", > > "org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)", > > "org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)", > > "org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)", > > "org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:839)", > > "org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:805)", > "org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:558)", > > "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)", > > "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)", > > "org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)", > ... > {noformat} > {panel:title="Original Jira Description"} > If withCoreInfo is false iwRef.decref() will not > be called to release the reader lock, preventing any further writer locks. > > [https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144] > Line 130 should be moved inside the if statement L144. > [~ab] FYI > {panel} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org