[ https://issues.apache.org/jira/browse/SOLR-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Henrik resolved SOLR-15139. --------------------------- Resolution: Workaround It seems I had too agressive settings for my timeouts. Things started working when I restarted the nodes with {{-Dsolr.jetty.http.idleTimeout=60000 -DsocketTimeout=60000 -DconnTimeout=60000}}. Previous values were {{-Dsolr.jetty.http.idleTimeout=10000 -DsocketTimeout=5000 -DconnTimeout=5000}}. > Recovering forever after upgrade to 8.8.0: Timeout waiting for collection > state > ------------------------------------------------------------------------------- > > Key: SOLR-15139 > URL: https://issues.apache.org/jira/browse/SOLR-15139 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java), SolrCloud, SolrJ > Affects Versions: 8.8 > Environment: Linux solr3577 4.9.0-0.bpo.6-amd64 #1 SMP Debian > 4.9.88-1+deb9u1~bpo8+1 (2018-05-13) x86_64 GNU/Linux > Reporter: Henrik > Priority: Major > > After upgrading our Solr Cloud collections from 8.7.0 to 8.8.0 I struggle to > get a consistent state. We have 8 servers hosting 3 collections, with > shards/replicas spread over alle the servers. > > All replicas on solr3577 is in "Recovering" state, and is repeating every > five minutes: "RemoteSolrException: Error from server at > http://solr3579.foo.bar:12621/solr: Timeout waiting for collection state", as > you see here: > {code} > ERROR [20210205T090741,988] > recoveryExecutor-11-thread-8-processing-n:solr3579.foo.bar:12621_solr > x:foo_bar_shard22_replica_n86 c:foo_bar s:shard22 r:core_node89 > org.apache.solr.cloud.RecoveryStrategy - Recovery failed - trying again... > (12) > ERROR [20210205T090741,995] > recoveryExecutor-11-thread-9-processing-n:solr3579.foo.bar:12621_solr > x:foo_bar_shard2_replica_n6 c:foo_bar s:shard2 r:core_node9 > org.apache.solr.cloud.RecoveryStrategy - Error while trying to recover. > core=foo_bar_shard2_replica_n6:java.util.concurrent.ExecutionException: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://solr3579.foo.bar:12621/solr: Timeout waiting for > collection state. > at > java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:876) > at > org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:614) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) > at > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:316) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) > at java.base/java.lang.Thread.run(Thread.java:832) > Caused by: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://solr3579.foo.bar:12621/solr: Timeout waiting for > collection state. > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.lambda$httpUriRequest$0(HttpSolrClient.java:310) > ... 5 more > ERROR [20210205T090741,995] > recoveryExecutor-11-thread-9-processing-n:solr3579.foo.bar:12621_solr > x:foo_bar_shard2_replica_n6 c:foo_bar s:shard2 r:core_node9 > org.apache.solr.cloud.RecoveryStrategy - Recovery failed - trying again... > (12) > {code} > At the same time solr3579 is repeating "NotInClusterStateException: Timeout > waiting for collection state", as seen here: > {code} > ERROR [20210205T090741,994] qtp313082880-176670 > org.apache.solr.servlet.HttpSolrCall - > org.apache.solr.cloud.ZkController$NotInClusterStateException: Timeout > waiting for collection state. > at > org.apache.solr.handler.admin.PrepRecoveryOp.execute(PrepRecoveryOp.java:163) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > at > org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397) > at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216) > at > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:836) > at > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:800) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:545) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1612) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1582) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) > at org.eclipse.jetty.server.Server.handle(Server.java:516) > at > org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) > at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) > at > org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) > at > org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) > at java.base/java.lang.Thread.run(Thread.java:832) > {code} > How do I remedy this? I have restarted solr3577 and don't know if I dare to > restart solr3579 (which is active and leader). > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org