[ 
https://issues.apache.org/jira/browse/SOLR-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14431:
--------------------------------------
    Component/s: SolrCloud
                 replication (java)
    Description: 
A bug in the {{SegmentsInfoRequestHandler}} (aka: {{/admin/segments}} - which 
is used under the covers when viewing the "Segments Info" panel of a core in 
the Admin UI) causes it to increment the internal "ref-count" of the 
IndexWriter by default, with out ever decrementing that ref-count.

This can cause delayed problems in any situation where the IndexWriter needs 
updated/replaced/locked:
 * Core {{RELOAD}} operations
 * Master/Slave replication (via IndexFetcher)
 * {{PULL}} Replica updates (via IndexFetcher)
 * {{TLOG}} Replica updates (via IndexFetcher)
 * {{NRT}} Recovery from Leader (via IndexFetcher)

...these manifest as operations that "stall" due to the threads attempting to 
execute them blocking forever waiting for a {{ReentrantReadWriteLock}} in 
{{DefaultSolrCoreState}} that will never be released.

A config only workaround exists for this problem, by explicitly declaring the 
{{/admin/segments}} handler in {{solrconfig.xml}} with an {{invariants}} param 
that requests additional info, forcing it down a code path where it _uses_ the 
IndexWriter, *and decrements the ref-count, releasing the lock*.
{code:java|title=solrconfig.xml workaround}
  <requestHandler name="/admin/segments" 
class="solr.SegmentsInfoRequestHandler">
    <!-- work around for https://issues.apache.org/jira/browse/SOLR-14431 -->
    <lst name="invariants">
      <bool name="coreInfo">true</bool>
    </lst>
  </requestHandler>
{code}
----
Example stack traces of what this can look like
{noformat:title=IndexFetcher example stalled thread"}
      "thread",{
        "id":65,
        "name":"indexFetcher-19-thread-1",
        "state":"TIMED_WAITING",
        
"lock":"java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@22a18ed",
        "cpuTime":"1454860.0285ms",
        "userTime":"622230.0000ms",
        "stackTrace":["java.base@11.0.7/jdk.internal.misc.Unsafe.park(Native 
Method)",
          
"java.base@11.0.7/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)",
          
"java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:980)",
          
"java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)",
          
"java.base@11.0.7/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1131)",
          
"org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)",
          
"org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240)",
          
"org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)",
          
"org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)",
          
"org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)",
          
"org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1210)",
          
"org.apache.solr.handler.ReplicationHandler$$Lambda$513/0x00000008006bf440.run(Unknown
 Source)",
          
"java.base@11.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)",
          
"java.base@11.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)",
          
"java.base@11.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)",
          
"java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)",
          
"java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)",
          "java.base@11.0.7/java.lang.Thread.run(Thread.java:834)"]},
{noformat}
{noformat:title=Core RELOAD example stalled thread}
      "thread",{
        "id":16,
        "name":"qtp1558079303-16",
        "state":"WAITING",
        "lock":"java.lang.Object@70c81fe1",
        "cpuTime":"73.4453ms",
        "userTime":"60.0000ms",
        "stackTrace":["java.base@11.0.4/java.lang.Object.wait(Native Method)",
          "java.base@11.0.4/java.lang.Object.wait(Object.java:328)",
          
"org.apache.solr.core.SolrCores.waitAddPendingCoreOps(SolrCores.java:394)",
          "org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1545)",
          
"org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:132)",
          
"org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$266/0x0000000100431040.execute(Unknown
 Source)",
          
"org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:362)",
          
"org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)",
          
"org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)",
          
"org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)",
          
"org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:839)",
          
"org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:805)",
          "org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:558)",
          
"org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)",
          
"org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)",
          
"org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)",
...
{noformat}
{panel:title="Original Jira Description"}
If withCoreInfo is false iwRef.decref() will not
 be called to release the reader lock, preventing any further writer locks.
 
[https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144]

Line 130 should be moved inside the if statement L144.

[~ab] FYI
{panel}

  was:
If withCoreInfo is false iwRef.decref() will not
be called to release the reader lock, preventing any further writer locks.
https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144

Line 130 should be moved inside the if statement L144.

[~ab] FYI

       Priority: Critical  (was: Minor)
        Summary: Using "Segments Info" UI screen can cause future stalls in 
replication/recovery/core-reload (/admin/segments)  (was: 
SegmentsInfoRequestHandler.java does not release IndexWriter)

updated jira summary, description, and metadata to make it more clear how 
serious this issue is, and to point out a config only work around.

> Using "Segments Info" UI screen can cause future stalls in 
> replication/recovery/core-reload (/admin/segments)
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-14431
>                 URL: https://issues.apache.org/jira/browse/SOLR-14431
>             Project: Solr
>          Issue Type: Bug
>          Components: Admin UI, replication (java), SolrCloud
>    Affects Versions: 8.1.1, 8.5.1
>            Reporter: Tiziano Degaetano
>            Assignee: Andrzej Bialecki
>            Priority: Critical
>             Fix For: 8.6
>
>
> A bug in the {{SegmentsInfoRequestHandler}} (aka: {{/admin/segments}} - which 
> is used under the covers when viewing the "Segments Info" panel of a core in 
> the Admin UI) causes it to increment the internal "ref-count" of the 
> IndexWriter by default, with out ever decrementing that ref-count.
> This can cause delayed problems in any situation where the IndexWriter needs 
> updated/replaced/locked:
>  * Core {{RELOAD}} operations
>  * Master/Slave replication (via IndexFetcher)
>  * {{PULL}} Replica updates (via IndexFetcher)
>  * {{TLOG}} Replica updates (via IndexFetcher)
>  * {{NRT}} Recovery from Leader (via IndexFetcher)
> ...these manifest as operations that "stall" due to the threads attempting to 
> execute them blocking forever waiting for a {{ReentrantReadWriteLock}} in 
> {{DefaultSolrCoreState}} that will never be released.
> A config only workaround exists for this problem, by explicitly declaring the 
> {{/admin/segments}} handler in {{solrconfig.xml}} with an {{invariants}} 
> param that requests additional info, forcing it down a code path where it 
> _uses_ the IndexWriter, *and decrements the ref-count, releasing the lock*.
> {code:java|title=solrconfig.xml workaround}
>   <requestHandler name="/admin/segments" 
> class="solr.SegmentsInfoRequestHandler">
>     <!-- work around for https://issues.apache.org/jira/browse/SOLR-14431 -->
>     <lst name="invariants">
>       <bool name="coreInfo">true</bool>
>     </lst>
>   </requestHandler>
> {code}
> ----
> Example stack traces of what this can look like
> {noformat:title=IndexFetcher example stalled thread"}
>       "thread",{
>         "id":65,
>         "name":"indexFetcher-19-thread-1",
>         "state":"TIMED_WAITING",
>         
> "lock":"java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@22a18ed",
>         "cpuTime":"1454860.0285ms",
>         "userTime":"622230.0000ms",
>         "stackTrace":["java.base@11.0.7/jdk.internal.misc.Unsafe.park(Native 
> Method)",
>           
> "java.base@11.0.7/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)",
>           
> "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:980)",
>           
> "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)",
>           
> "java.base@11.0.7/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1131)",
>           
> "org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)",
>           
> "org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240)",
>           
> "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)",
>           
> "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)",
>           
> "org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)",
>           
> "org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1210)",
>           
> "org.apache.solr.handler.ReplicationHandler$$Lambda$513/0x00000008006bf440.run(Unknown
>  Source)",
>           
> "java.base@11.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)",
>           
> "java.base@11.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)",
>           
> "java.base@11.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)",
>           
> "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)",
>           
> "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)",
>           "java.base@11.0.7/java.lang.Thread.run(Thread.java:834)"]},
> {noformat}
> {noformat:title=Core RELOAD example stalled thread}
>       "thread",{
>         "id":16,
>         "name":"qtp1558079303-16",
>         "state":"WAITING",
>         "lock":"java.lang.Object@70c81fe1",
>         "cpuTime":"73.4453ms",
>         "userTime":"60.0000ms",
>         "stackTrace":["java.base@11.0.4/java.lang.Object.wait(Native Method)",
>           "java.base@11.0.4/java.lang.Object.wait(Object.java:328)",
>           
> "org.apache.solr.core.SolrCores.waitAddPendingCoreOps(SolrCores.java:394)",
>           
> "org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1545)",
>           
> "org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:132)",
>           
> "org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$266/0x0000000100431040.execute(Unknown
>  Source)",
>           
> "org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:362)",
>           
> "org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)",
>           
> "org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)",
>           
> "org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)",
>           
> "org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:839)",
>           
> "org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:805)",
>           "org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:558)",
>           
> "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)",
>           
> "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)",
>           
> "org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)",
> ...
> {noformat}
> {panel:title="Original Jira Description"}
> If withCoreInfo is false iwRef.decref() will not
>  be called to release the reader lock, preventing any further writer locks.
>  
> [https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144]
> Line 130 should be moved inside the if statement L144.
> [~ab] FYI
> {panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to