Hi, I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work out exactly how much throughput my cluster can handle.
Consistently in my test I see a replica go into recovering state forever caused by what looks like a timeout during replication. I can understand the timeout and failure (I am hitting it fairly hard) but what seems odd to me is that when I stop the heavy load it still does not recover the next time it tries, it seems broken forever until I manually go in, clear the index and let it do a full resync. Is this normal? Am I misunderstanding something? My cluster has 4 nodes (2 shards, 2 replicas) (AWS m3.2xlarge). I am indexing with ~800 concurrent connections and a 10 sec soft commit. I consistently get this problem with a throughput of around 1.5 million documents per hour. Thanks all, Darren Stack Traces & Messages: [qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter â null:org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226) at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Error while trying to recover. core=assets_shard2_replica1:java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://xxx.xxx.15.171:8080/solr at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235) Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://xxx.xxx.15.171:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566) at org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245) at org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.net.SocketException: Socket closed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:452) ... 6 more 853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â Recovery failed - trying again... (0) core=assets_shard2_replica1 853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â Recovery failed - interrupted. core=assets_shard2_replica1 853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â Recovery failed - I give up. core=assets_shard2_replica1 853918 [RecoveryThread] WARN org.apache.solr.cloud.RecoveryStrategy â Stopping recovery for zkNodeName=xxx.xxx.15.174:8080_solr_assets_shard2_replica1core=assets_shard2_replica1 853933 [Thread-382] WARN org.apache.solr.cloud.RecoveryStrategy â Stopping recovery for zkNodeName=xxx.xxx.15.174:8080_solr_assets_shard2_replica1core=assets_shard2_replica1