[ 
https://issues.apache.org/jira/browse/SOLR-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Endika Posadas updated SOLR-14458:
----------------------------------
    Description: 
In a solr cluster, a Solr instance containing two shards has lost connection 
with zookeeper. Upon reconnecting, it has checked the status with the leader 
and start a recovery. However, it's stuck in recovering status without making 
further progress (has been like that for days now).

 

Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  
trying to acquire the lock to createa new Index Writer: `at 
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)`
 (

after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the 
ReentrantLock it's waiting for is never released. Moreover, no thread can be 
found holding the lock, leaving restarting Solr as the only solution.

There is no Error in the logs that can help with the issue. I have attached 
solr.log and a grep with node 7 lines, as well as a thread dump.

 

There is also no other recovery currently running. In Solr metrics, 4 
recoveries have started, 3 have completed and 1 is running (forever).

 

My hypothesis is that 
org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore,
 boolean) was called once but for some reason openIndexWriter was skipped.

  was:
In a solr cluster, a Solr instance containing two shards has lost connection 
with zookeeper. Upon reconnecting, it has checked the status with the leader 
and start a recovery. However, it's stuck in recovering status without making 
further progress (has been like that for days now).

 

Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  
trying to acquire the lock to createa new Index Writer: `at 
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)`
 (

after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the 
ReentrantLock it's waiting for is never released. Moreover, no thread can be 
found holding the lock, leaving restarting Solr as the only solution.

There is no Error in the logs that can help with the issue. I have attached 
solr.log and a grep with node 7 lines, as well as a thread dump.

 

My hypothesis is that 
org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore,
 boolean) was called once but for some reason openIndexWriter was skipped.


> Solr Replica locked in recovering state after a Zookeeper disconnection
> -----------------------------------------------------------------------
>
>                 Key: SOLR-14458
>                 URL: https://issues.apache.org/jira/browse/SOLR-14458
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 8.4.1
>         Environment: A Solr cluster with 2 replicas that each has 2 shards 
> split across 2 Windows VMS.
> They use a 3 replica zookeeper across 3 vms.
>            Reporter: Endika Posadas
>            Priority: Major
>         Attachments: replica7.log, solr-thread-dump.log, solr.log
>
>
> In a solr cluster, a Solr instance containing two shards has lost connection 
> with zookeeper. Upon reconnecting, it has checked the status with the leader 
> and start a recovery. However, it's stuck in recovering status without making 
> further progress (has been like that for days now).
>  
> Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  
> trying to acquire the lock to createa new Index Writer: `at 
> org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)`
>  (
> after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the 
> ReentrantLock it's waiting for is never released. Moreover, no thread can be 
> found holding the lock, leaving restarting Solr as the only solution.
> There is no Error in the logs that can help with the issue. I have attached 
> solr.log and a grep with node 7 lines, as well as a thread dump.
>  
> There is also no other recovery currently running. In Solr metrics, 4 
> recoveries have started, 3 have completed and 1 is running (forever).
>  
> My hypothesis is that 
> org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore,
>  boolean) was called once but for some reason openIndexWriter was skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to