Solr Version: 4.3.1
Number Shards: 10
Replicas: 1
Heap size: 15GB
Machine RAM: 30GB
Zookeeper timeout: 45 seconds

We are continuing the fight to keep our solr setup functioning.  As a
result of this we have made significant changes to our schema to reduce the
amount of data we write.  I setup a new cluster to reindex our data,
initially I ran the import with no replicas, and achieved quite impressive
results.  Our peak was 60,000 new documents per minute, no shard loses, no
outages due to garbage collection (which is an issue we see in production),
at the end of the load the index stood at 97,000,000 documents and 20GB per
shard.  During the highest insertion rate I would say that querying
suffered, but that is not of concern right now.

I have now added in 1 replica for each shard, indexing time has doubled -
not surprising - and as it was so good to start with not a problem.  I
continue to just write to the leaders and the issue is that that replicas
are continually going into recovery.

The leaders show:


ERROR - 2014-02-14 11:47:45.757; org.apache.solr.common.SolrException;
shard update error StdNode:
http://10.35.133.176:8983/solr/sessionfilterset/:org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at:
http://10.35.133.176:8983/solr/sessionfilterset
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
        at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
        at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.http.NoHttpResponseException: The target server
failed to respond
        at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
        at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
        at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
        at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
        at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
        at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
        at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
        at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
        at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
        at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
        ... 11 more

The replica is not busy garbage collecting, as it doesn't coincide with a
full gc and the collection times are low.  The replica appears to be
accepting adds milliseconds before this appears in the log:

INFO  - 2014-02-14 11:59:54.366;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover

I have reduced the load down to 5,000 documents per minute and they appear
to only stay up for a couple of minutes, I would like to be confident that
we could handle more than this during our peak times.

Initially I was getting connection reset errors on the leaders, but I
changed the jetty connector to the nio one and now the above message is
what I have received.  I have also upped the header request and response
sizes.

Any ideas - other than not using replicas as proposed by a colleague?

Thanks very much in advance.


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com <http://www.sessioncam.com>*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Reply via email to