Great, thanks Mark ! I'll test the fix and post my results. Alain
On Saturday, December 8, 2012, Mark Miller wrote: > After some more playing around on 5x I have duplicated the issue. I'll > file a JIRA issue for you and fix it shortly. > > - Mark > > On Dec 8, 2012, at 8:43 AM, Mark Miller <markrmil...@gmail.com> wrote: > > > Hmm…I've tried to replicate what looked like a bug from your report (3 > Solr servers stop/start ), but on 5x it works no problem for me. It > shouldn't be any different on 4x, but I'll try that next. > > > > In terms of starting up Solr without a working ZooKeeper ensemble - it > won't work currently. Cores won't be able to register with ZooKeeper and > will fail loading. It would probably be nicer to come up in search only > mode and keep trying to reconnect to zookeeper - file a JIRA issue if you > are interested. > > > > On the zk data dir, see > http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup > > > > - Mark > > > > On Dec 7, 2012, at 10:22 PM, Mark Miller <markrmil...@gmail.com> wrote: > > > >> Hey, I'll try and answer this tomorrow. > >> > >> There is a def an unreported bug in there that needs to be fixed for > the restarting the all nodes case. > >> > >> Also, a 404 one is generally when jetty is starting or stopping - there > are points where 404's can be returned. I'm not sure why else you'd see > one. Generally we do retries when that happens. > >> > >> - Mark > >> > >> On Dec 7, 2012, at 1:07 PM, Alain Rogister <alain.rogis...@gmail.com> > wrote: > >> > >>> I am reporting the results of my stress tests against Solr 4.x. As I > was > >>> getting many error conditions with 4.0, I switched to the 4.1 trunk in > the > >>> hope that some of the issues would be fixed already. Here is my setup : > >>> > >>> - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I > realize > >>> this is not representative of a production environment but it's a fine > way > >>> to find out what happens under resource-constrained conditions. > >>> - 3 Solr servers, 3 cores (2 of which are very small, the third one > has 410 > >>> MB of data) > >>> - single shard > >>> - 3 Zookeeper instances > >>> - HAProxy load balancing requests across Solr servers > >>> - JMeter or ApacheBench running the tests : 5 thread pools of 20 > threads > >>> each, sending search requests continuously (no updates) > >>> > >>> In nominal conditions, it all works fine i.e. it can process a million > >>> requests, maxing out the CPUs at all time, without experiencing nasty > >>> failures. There are errors in the logs about replication failures > though; > >>> they should be benigne in this case as no updates are taking place but > it's > >>> hard to tell what is going on exactly. Example : > >>> > >>> Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse > >>> WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr > >>> exception talking to > >>> http://192.168.0.101:8985/solr/adressage/, failed > >>> org.apache.solr.common.SolrException: Server at > >>> http://192.168.0.101:8985/solr/adressage returned non ok status:404, > >>> message:Not Found > >>> at > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) > >>> at > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >>> at > >>> > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166) > >>> at > >>> > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133) > >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) > >>> at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >>> at java.util.concurrent.FutureTask.run(FutureTask.