Hi! I'm trying to setup SolrCloud with replicated zookeeper, but have a problem.
I'm using Jetty 8 (not embedded), Zookeeper 3.3.6, SolrCloud 4.0 from branch, Ubuntu 12.04 LTS. My configs are: Four Jetty instances running on ports 8080, 8081, 8082 and 8083 Jetty1.sh: JAVA_OPTIONS="$JAVA_OPTIONS -Djava.util.logging.config.file=$JETTY_HOME/etc/logging.properties -XX:+DisableExplicitGC \ -XX:PermSize=96M -XX:MaxPermSize=96M -Xmx512M -Xms512M -XX:NewSize=96M -XX:MaxNewSize=96M \ -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled \ -XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9 -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 \ -verbose:gc -XX:+PrintGCTimeStamps -Xloggc:$JETTY_HOME/logs/gc.log -Dsolr.solr.home=/opt/search4/solr/1 \ -Dbootstrap_confdir=/opt/search4/solr/1/collection1/conf -Dcollection.configName=sm -DnumShards=2 -DzkHost=10.112.1.2:2181,10.112.1.2:2182,10.112.1.2:2183" Jetty2.sh (3 and 4 are the same except solr.home var): JAVA_OPTIONS="$JAVA_OPTIONS -Djava.util.logging.config.file=$JETTY_HOME/etc/logging.properties -XX:+DisableExplicitGC \ -XX:PermSize=96M -XX:MaxPermSize=96M -Xmx512M -Xms512M -XX:NewSize=96M -XX:MaxNewSize=96M \ -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled \ -XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9 -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 \ -verbose:gc -XX:+PrintGCTimeStamps -Xloggc:$JETTY_HOME/logs/gc.log -Dsolr.solr.home=/opt/search4/solr/2 \ -DzkHost=10.112.1.2:2181,10.112.1.2:2182,10.112.1.2:2183" My solr.xml files: solr.xml (8080 port) <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores adminPath="/admin/cores" defaultCoreName="collection1" host="10.112.1.2" hostPort="8080" hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}"> <core name="collection1" instanceDir="collection1" shard="shard1" /> </cores> </solr> solr.xml (8081 port) <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores adminPath="/admin/cores" defaultCoreName="collection1" host="10.112.1.2" hostPort="8081" hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}"> <core name="collection1" instanceDir="collection1" shard="shard2" /> </cores> </solr> solr.xml (8082 port) <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores adminPath="/admin/cores" defaultCoreName="collection1" host="10.112.1.2" hostPort="8082" hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}"> <core name="collection1" instanceDir="collection1" shard="shard1" /> </cores> </solr> solr.xml (8083 port) <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores adminPath="/admin/cores" defaultCoreName="collection1" host="10.112.1.2" hostPort="8083" hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}"> <core name="collection1" instanceDir="collection1" shard="shard2" /> </cores> </solr> My zookeeper configs (are the same, except dataDir and clientPort): tickTime=2000 initLimit=10 syncLimit=5 dataDir=/opt/search4/zookeeper/1/data clientPort=2181 # zookeeper ensemble server.1=10.112.1.2:2888:3888 server.2=10.112.1.2:2889:3889 server.3=10.112.1.2:2890:3890 I had put myid file to datadir to each zookeper and start them and after that I started Jetty. Everything looks fine, SolrCloud is running normally, I have two leaders on ports 8080 (shard1) and 8081 (shard2), but when I turn off first JVM (port 8080) Solr at third JVM doesn't become leader and I see errors in logs (3rd JVM on port 8082): Nov 08, 2012 11:00:40 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=118104 Nov 08, 2012 11:00:41 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Starting Replication Recovery. core=collection1 Nov 08, 2012 11:00:41 AM org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false Nov 08, 2012 11:00:41 AM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover. core=collection1:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://10.112.1.2:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) Caused by: org.apache.http.conn.HttpHostConnectException: Connection to http://10.112.1.2:8080 refused at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) ... 4 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148) ... 12 more Nov 08, 2012 11:00:41 AM org.apache.solr.cloud.RecoveryStrategy doRecovery SEVERE: Recovery failed - trying again... core=collection1 Nov 08, 2012 11:00:41 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=117601 Nov 08, 2012 11:00:41 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=117098 But when I run not replicated embedded zookeper no errors are present in logs. When I turn off second JVM app (8082) - see attach <http://lucene.472066.n3.nabble.com/file/n4018984/Untitled.png> I have empty segments at all shards, numFound=0. Please advice, what I'm doing wrong. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Replicated-zookeeper-tp4018984.html Sent from the Solr - User mailing list archive at Nabble.com.