I'm running Solr Cloud 6.1.0, with a Java client using SolrJ 5.4.1. Every once in awhile, during a query, I get a pair of messages logged in the client from CloudSolrClient -- an error about a request failing, then a warning saying that it's retrying after a stale state error.
For this test, the collection (test_collection) has one shard, with RF=2. There are two machines, 10.112.7.2 (replica) and 10.112.7.4 (leader). The client is on 10.112.7.4. Note that the system time on 10.112.7.4 is about 1 minute, 5-6 seconds ahead of the other machine. ----------------------------------- Leader (10.112.7.4) Solr log: ----------------------------------- 19:27:16.583 ERROR [c:test_collection s:shard1 r:core_node2 x:test_collection_shard1_replica2] o.a.s.u.StreamingSolrClients error org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 19:27:16.587 WARN [c:test_collection s:shard1 r:core_node2 x:test_collection_shard1_replica2] o.a.s.u.p.DistributedUpdateProcessor Error sending update to http://10.112.7.2:8983/solr org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 19:27:16.587 ERROR [c:test_collection s:shard1 r:core_node2 x:test_collection_shard1_replica2] o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on replica http://10.112.7.2:8983/solr/test_collection_shard1_replica1/ org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 19:27:16.598 WARN [c:test_collection s:shard1 r:core_node2 x:test_collection_shard1_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Leader is publishing core=test_collection_shard1_replica1 coreNodeName =core_node1 state=down on behalf of un-reachable replica http://10.112.7.2:8983/solr/test_collection_shard1_replica1/ ----------------------------------- Replica (10.112.7.2) Solr log: ----------------------------------- 19:26:11.316 WARN [c:test_collection s:shard1 r:core_node1 x:test_collection_shard1_replica1] o.a.s.c.RecoveryStrategy Stopping recovery for core=[test_collection_shard1_replica1] coreNodeName=[core_node1] 19:26:19.385 WARN [c:test_collection s:shard1 r:core_node1 x:test_collection_shard1_replica1] o.a.s.u.PeerSync PeerSync: core=test_collection_shard1_replica1 url=http://10.112.7.2:8983/solr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates 19:26:20.115 WARN [c:test_collection s:shard1 r:core_node1 x:test_collection_shard1_replica1] o.a.s.u.UpdateLog Starting log replay tlog{file=/var/solr/data/test_collection_shard1_replica1/data/tlog/tlog.0000000000000000000 refcount=2} active=true starting pos=34809286 19:26:20.146 WARN [c:test_collection s:shard1 r:core_node1 x:test_collection_shard1_replica1] o.a.s.u.UpdateLog Log replay finished. recoveryInfo=RecoveryInfo{adds=133 deletes=0 deleteByQuery=0 errors=0 positionOfStart=34809286} ----------------------------------- Leader (10.112.7.4) Java app log: ----------------------------------- 19:27:17,173 [ERROR][impl.CloudSolrClient][pool-6-thread-18][CloudSolrClient.java@904] - Request to collection test_collection failed due to (510) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.112.7.2:8983/solr/test_collection: Expected mime type application/octet-stream but got text/html. <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 510 {metadata={error-class=org.apache.solr.common.SolrException,root-error-class=org.apache.solr.common.SolrException},msg={"test_collection":7},code=510}</title> </head> <body> HTTP ERROR 510 <p>Problem accessing /solr/test_collection/select. Reason: <pre> {metadata={error-class=org.apache.solr.common.SolrException,root-error-class=org.apache.solr.common.SolrException},msg={"test_collection":7},code=510}</pre></p> </body> </html> , retry? 0 19:27:17,174 [WARN ][impl.CloudSolrClient][pool-6-thread-18][CloudSolrClient.java@953] - Re-trying request to collection(s) test_collection after stale state error from server. ----------------------------------- Anyone know what could be causing this error? It's very infrequent (happens ~10 times after 2 million reads over the course of 3 hours), but I'd still like to avoid it if possible.