Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for the 
restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there are 
points where 404's can be returned. I'm not sure why else you'd see one. 
Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister <alain.rogis...@gmail.com> wrote:

> I am reporting the results of my stress tests against Solr 4.x. As I was
> getting many error conditions with 4.0, I switched to the 4.1 trunk in the
> hope that some of the issues would be fixed already. Here is my setup :
> 
> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
> this is not representative of a production environment but it's a fine way
> to find out what happens under resource-constrained conditions.
> - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
> MB of data)
> - single shard
> - 3 Zookeeper instances
> - HAProxy load balancing requests across Solr servers
> - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
> each, sending search requests continuously (no updates)
> 
> In nominal conditions, it all works fine i.e. it can process a million
> requests, maxing out the CPUs at all time, without experiencing nasty
> failures. There are errors in the logs about replication failures though;
> they should be benigne in this case as no updates are taking place but it's
> hard to tell what is going on exactly. Example :
> 
> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
> exception talking to
> http://192.168.0.101:8985/solr/adressage/, failed
> org.apache.solr.common.SolrException: Server at
> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
> message:Not Found
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> 
> Then I simulated various failure scenarios :
> 
> - 1 Solr server stop/start
> - 2 Solr servers stop/start
> - 3 Solr servers stop/start : it seems that in this case, the Solr servers
> *cannot* be restarted : more exactly, the restarted server will consider
> that it is number 1 out of 4 and wait for the other 3 to come up. The only
> way out is to stop it again, then stop all Zookeeper instances *and* clean
> up their zkdata directory, start them, then start the Solr servers.
> 
> I noticed that these zkdata directory had grown to 200 MB after a while.
> What exactly is in there besides the configuration data ? Does it stop
> growing ?
> 
> Then I tried this :
> 
> - kill 1 Zookeeper process
> - kill 2 Zookeeper processes
> - stop/start 1 Solr server
> 
> When doing this, I experienced (many times) situations where the Solr
> servers could not reconnect and threw scary exceptions. The only way out
> was to restart the whole cluster.
> 
> Q : when, if ever, is one supposed to clean up the zkdata directories ?
> 
> Here are the errors I found in the logs. It seems that some of them have
> been reported in JIRA but 4.1-trunk seems to experience basically the same
> issues as 4.0 in my test scenarios.
> 
> Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
> WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
> couldn't connect to
> http://192.168.0.101:8984/solr/cachede/, counting as success
> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
> SEVERE: Sync request error:
> org.apache.solr.client.solrj.SolrServerException: Server refused connection
> at: http://192.168.0.101:8984/solr/cachede
> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
> SEVERE: http://192.168.0.101:8983/solr/cachede/: Could not tell a replica
> to recover:org.apache.solr.client.solrj.SolrServerException: Server refused
> connection at: http://192.168.0.101:8984/solr
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
> http://192.168.0.101:8984 refused
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
> at
> org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
> at
> org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> ... 5 more
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> at java.net.Socket.connect(Socket.java:579)
> at
> org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
> ... 13 more
> 
> Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr  got a
> 404 from http://192.168.0.101:8985/solr/adressage/, counting as success
> Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
> SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
> http://192.168.0.101:8985/solr/adressage returned non ok status:404,
> message:Not Found
> Dec 07, 2012 8:04:00 PM org.apache.solr.update.PeerSync handleResponse
> WARNING: PeerSync: core=formabanque url=http://192.168.0.101:8983/solr  got
> a 404 from http://192.168.0.101:8985/solr/formabanque/, counting as success
> Dec 07, 2012 8:04:00 PM org.apache.solr.common.SolrException log
> SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
> http://192.168.0.101:8985/solr/formabanque returned non ok status:404,
> message:Not Found
> 
> Dec 07, 2012 8:04:32 PM org.apache.solr.update.PeerSync sync
> WARNING: no frame of reference to tell of we've missed updates
> 
> Dec 07, 2012 8:03:58 PM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to
> recover:org.apache.solr.client.solrj.SolrServerException: Server refused
> connection at: http://192.168.0.101:8984/solr/adressage
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> at
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:182)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:134)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
> Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
> http://192.168.0.101:8984 refused
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
> at
> org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
> at
> org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> ... 6 more
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> at java.net.Socket.connect(Socket.java:579)
> at
> org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
> ... 14 more
> 
> Dec 07, 2012 8:03:58 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> SEVERE: Recovery failed - trying again... (0) core=adressage
> 
> SEVERE: Error getting leader from zk
> org.apache.solr.common.SolrException: Could not get leader props
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:735)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:699)
> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:664)
> at org.apache.solr.cloud.ZkController.register(ZkController.java:603)
> at org.apache.solr.cloud.ZkController.register(ZkController.java:558)
> at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:791)
> at org.apache.solr.core.CoreContainer.register(CoreContainer.java:775)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:567)
> at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/adressage/leaders/shard1
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
> at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:713)
> ... 16 more
> 
> Dec 07, 2012 4:39:23 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:159)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)

Reply via email to