[ https://issues.apache.org/jira/browse/SOLR-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Endika Posadas updated SOLR-14458: ---------------------------------- Attachment: image-2020-05-05-09-47-27-854.png > Solr Replica locked in recovering state after a Zookeeper disconnection > ----------------------------------------------------------------------- > > Key: SOLR-14458 > URL: https://issues.apache.org/jira/browse/SOLR-14458 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: 8.4.1 > Environment: A Solr cluster with 2 replicas that each has 2 shards > split across 2 Windows VMS. > They use a 3 replica zookeeper across 3 vms. > Reporter: Endika Posadas > Priority: Major > Attachments: image-2020-05-05-09-47-27-854.png, replica7.log, > solr-thread-dump.log, solr.log, solrrecovering.png > > > In a solr cluster, a Solr instance containing two shards has lost connection > with zookeeper. Upon reconnecting, it has checked the status with the leader > and start a recovery. However, it's stuck in recovering status without making > further progress (has been like that for days now). > > Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is > trying to acquire the lock to createa new Index Writer: `at > org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)` > ( > after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the > ReentrantLock it's waiting for is never released. Moreover, no thread can be > found holding the lock, leaving restarting Solr as the only solution. > There is no Error in the logs that can help with the issue. I have attached > solr.log and a grep with node 7 lines, as well as a thread dump. > > There is also no other recovery currently running. In Solr metrics, 4 > recoveries have started, 3 have completed and 1 is running (forever). > > My hypothesis is that > org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore, > boolean) was called once but for some reason openIndexWriter was skipped. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org