I'm running Solr Cloud 6.1.0, with a Java client using SolrJ 5.4.1.

Every once in awhile, during a query, I get a pair of messages logged in
the client from CloudSolrClient -- an error about a request failing, then a
warning saying that it's retrying after a stale state error.

For this test, the collection (test_collection) has one shard, with RF=2.
There are two machines, 10.112.7.2 (replica) and 10.112.7.4 (leader). The
client is on 10.112.7.4. Note that the system time on 10.112.7.4 is about 1
minute, 5-6 seconds ahead of the other machine.

-----------------------------------
Leader (10.112.7.4) Solr log:
-----------------------------------
19:27:16.583 ERROR [c:test_collection s:shard1 r:core_node2
x:test_collection_shard1_replica2] o.a.s.u.StreamingSolrClients error
org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond
        at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
        at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
        at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
        at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
        at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
        at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
        at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
        at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
        at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
        at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
        at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

19:27:16.587 WARN  [c:test_collection s:shard1 r:core_node2
x:test_collection_shard1_replica2] o.a.s.u.p.DistributedUpdateProcessor
Error sending update to http://10.112.7.2:8983/solr
org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond
        at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
        at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
        at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
        at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
        at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
        at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
        at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
        at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
        at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
        at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
        at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

19:27:16.587 ERROR [c:test_collection s:shard1 r:core_node2
x:test_collection_shard1_replica2] o.a.s.u.p.DistributedUpdateProcessor
Setting up to try to start recovery on replica
http://10.112.7.2:8983/solr/test_collection_shard1_replica1/
org.apache.http.NoHttpResponseException: 10.112.7.2:8983 failed to respond
        at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
        at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
        at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
        at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
        at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
        at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
        at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
        at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
        at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
        at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
        at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
        at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:311)
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

19:27:16.598 WARN  [c:test_collection s:shard1 r:core_node2
x:test_collection_shard1_replica2] o.a.s.c.LeaderInitiatedRecoveryThread
Leader is publishing core=test_collection_shard1_replica1 coreNodeName
=core_node1 state=down on behalf of un-reachable replica
http://10.112.7.2:8983/solr/test_collection_shard1_replica1/

-----------------------------------
Replica (10.112.7.2) Solr log:
-----------------------------------
19:26:11.316 WARN  [c:test_collection s:shard1 r:core_node1
x:test_collection_shard1_replica1] o.a.s.c.RecoveryStrategy Stopping
recovery for core=[test_collection_shard1_replica1]
coreNodeName=[core_node1]
19:26:19.385 WARN  [c:test_collection s:shard1 r:core_node1
x:test_collection_shard1_replica1] o.a.s.u.PeerSync PeerSync:
core=test_collection_shard1_replica1 url=http://10.112.7.2:8983/solr too
many updates received since start - startingUpdates no longer overlaps with
our currentUpdates
19:26:20.115 WARN  [c:test_collection s:shard1 r:core_node1
x:test_collection_shard1_replica1] o.a.s.u.UpdateLog Starting log replay
tlog{file=/var/solr/data/test_collection_shard1_replica1/data/tlog/tlog.0000000000000000000
refcount=2} active=true starting pos=34809286
19:26:20.146 WARN  [c:test_collection s:shard1 r:core_node1
x:test_collection_shard1_replica1] o.a.s.u.UpdateLog Log replay finished.
recoveryInfo=RecoveryInfo{adds=133 deletes=0 deleteByQuery=0 errors=0
positionOfStart=34809286}

-----------------------------------
Leader (10.112.7.4) Java app log:
-----------------------------------
19:27:17,173
[ERROR][impl.CloudSolrClient][pool-6-thread-18][CloudSolrClient.java@904] -
Request to collection test_collection failed due to (510)
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://10.112.7.2:8983/solr/test_collection: Expected
mime type application/octet-stream but got text/html.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 510
{metadata={error-class=org.apache.solr.common.SolrException,root-error-class=org.apache.solr.common.SolrException},msg={"test_collection":7},code=510}</title>
</head>
<body>
HTTP ERROR 510
<p>Problem accessing /solr/test_collection/select. Reason:
<pre>
 
{metadata={error-class=org.apache.solr.common.SolrException,root-error-class=org.apache.solr.common.SolrException},msg={"test_collection":7},code=510}</pre></p>
</body>
</html>
, retry? 0

19:27:17,174 [WARN
][impl.CloudSolrClient][pool-6-thread-18][CloudSolrClient.java@953] -
Re-trying request to  collection(s) test_collection after stale state error
from server.
-----------------------------------

Anyone know what could be causing this error?

It's very infrequent (happens ~10 times after 2 million reads over the
course of 3 hours), but I'd still like to avoid it if possible.

Reply via email to