Thanks for the explaination It's clear now... I expanded the setup to: 4 hosts with 2 shards en 1 replicator for each shard. When I shutdown tomcat on solr01-dcg which is the master of shard 1 for both collections, the replicator (solr01-gs) seems NOT to takeover. See logs below. Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.SyncStrategy sync INFO: Sync replicas to http://solr01-gs:8983/solr/intradesk/ Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr START replicas=[http://solr01-dcg:8983/solr/intradesk/] nUpdates=100 Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr DONE. We have no versions. sync failed. Dec 3, 2012 9:55:34 AM org.apache.solr.common.SolrException log SEVERE: Sync Failed Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext rejoinLeaderElection INFO: There is a better leader candidate than us - going back into recovery Dec 3, 2012 9:55:35 AM org.apache.solr.update.DefaultSolrCoreState doRecovery INFO: Running recovery - first canceling any ongoing recovery Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy run INFO: Starting recovery process. core=intradesk recoveringAfterStartup=false Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Attempting to PeerSync from http://solr01-dcg:8983/solr/intradesk/ core=intradesk - recoveringAfterStartup=false Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr START replicas=[http://solr01-dcg:8983/solr/intradesk/] nUpdates=100 Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync WARNING: no frame of reference to tell of we've missed updates Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: PeerSync Recovery was not successful - trying replication. core=intradesk Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Starting Replication Recovery. core=intradesk Dec 3, 2012 9:55:35 AM org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false Dec 3, 2012 9:55:35 AM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover. core=intradesk:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://solr01-dcg:8983/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) Caused by: org.apache.http.conn.HttpHostConnectException: Connection to http://solr01-dcg:8983 refused at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) ... 4 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148) ... 12 more Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery SEVERE: Recovery failed - trying again... core=intradesk Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=179999 Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=179497 Dec 3, 2012 9:55:36 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178995 Dec 3, 2012 9:55:36 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178493 Dec 3, 2012 9:55:37 AM org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: Starting Replication Recovery. core=intradesk Dec 3, 2012 9:55:37 AM org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false Dec 3, 2012 9:55:37 AM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover. core=intradesk:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://solr01-dcg:8983/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) Caused by: org.apache.http.conn.HttpHostConnectException: Connection to http://solr01-dcg:8983 refused at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) ... 4 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148) ... 12 more Dec 3, 2012 9:55:37 AM org.apache.solr.cloud.RecoveryStrategy doRecovery SEVERE: Recovery failed - trying again... core=intradesk ... ![]() Any idea why solr stops responding? On 11/30/2012 04:57 PM, Mark Miller
wrote:
Thanks for all the detailed info! Yes, that is confusing. One of the sore points we have while supporting both std Solr and SolrCloud mode.In SolrCloud, every node is a Master when thinking about std Solr replication. However, as you see on the cloud page, only one of them is a *leader*. A leader is different than a master. Being a Master when it comes to the replication handler simply means you can replicate the index to other nodes - in SolrCloud we need every node to be capable of doing that. Each shard only has one leader, but every node in your cluster will be a replication master. - Mark On Nov 30, 2012, at 10:32 AM, Arkadi Colson <ark...@smartbit.be> wrote:This is my setup for solrCloud 4.0 on Tomcat 7.0.33 and zookeeper 3.4.5 hosts: - solr01-dcg (first started) - solr01-gs (second started so becomes replicate) collections: - smsc shards: - mydoc zookeeper: - on solr01-dcg - on solr01-gs SOLR_OPTS="-Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc -DzkClientTimeout=20000 -DzkHost=solr01-dcg:2181,solr01-gs:2181" solr.xml: <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores adminPath="/admin/cores" zkClientTimeout="20000" hostPort="8983"> <core schema="schema.xml" shard="shard1" instanceDir="/solr/mydoc/" name="mydoc" config="solrconfig.xml" collection="mydoc"/> </cores> </solr> I upload the config to zookeeper: java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost solr01-dcg:2181,solr01-gs:2181 -confdir /opt/solr/conf -confname smsc Linking the config to the collection: java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection mydoc -zkhost solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181 -confname smsc cloud on both hosts: <dcddagii.png> solr01-dcg <hhfgdeab.png> solr01-gs: <daafhdef.png> Any idea? Thanks! On 11/30/2012 03:15 PM, Mark Miller wrote:On Nov 30, 2012, at 5:08 AM, Arkadi Colson <ark...@smartbit.be> wrote:Hi I've setup an simple 2 machine cloud with 1 shard, one replicator and 2 collections.Everything went fine. However when I look at the interface: http://localhost:8983/solr/#/coll1/replication is reporting the both machines are master. Did I do something wrong in my config or isit a report for manual replication configuration? Can someone else check this?How? You don't really give anything to look at :)Is it poossible to link 2 collections to the same conf in zookeeper?Yes, that is no problem. - Mark -- Met vriendelijke groeten Arkadi Colson Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen T +32 11 64 08 80 • F +32 11 64 08 81 |
- Re: Replication in SolrCloud Mark Miller
- Re: Replication in SolrCloud Mark Miller
- Re: Replication in SolrCloud Arkadi Colson
- Re: Replication in SolrCloud Arkadi Colson