Thanks for the explaination It's clear now...

I expanded the setup to:
4 hosts with 2 shards en 1 replicator for each shard. When I shutdown tomcat on solr01-dcg which is the master of shard 1 for both collections, the replicator (solr01-gs) seems NOT to takeover.
See logs below.

Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess
INFO: Running the leader process.
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader
INFO: Checking if I should try and be the leader.
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader
INFO: My last published State was Active, it's okay to be the leader.
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess
INFO: I may be the new leader - try and sync
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.SyncStrategy sync
INFO: Sync replicas to http://solr01-gs:8983/solr/intradesk/
Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr START replicas=[http://solr01-dcg:8983/solr/intradesk/] nUpdates=100
Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr DONE.  We have no versions.  sync failed.
Dec 3, 2012 9:55:34 AM org.apache.solr.common.SolrException log
SEVERE: Sync Failed
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext rejoinLeaderElection
INFO: There is a better leader candidate than us - going back into recovery
Dec 3, 2012 9:55:35 AM org.apache.solr.update.DefaultSolrCoreState doRecovery
INFO: Running recovery - first canceling any ongoing recovery
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy run
INFO: Starting recovery process.  core=intradesk recoveringAfterStartup=false
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Attempting to PeerSync from http://solr01-dcg:8983/solr/intradesk/ core=intradesk - recoveringAfterStartup=false
Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=intradesk url="" class="moz-txt-link-freetext" href="http://solr01-gs:8983/solr">http://solr01-gs:8983/solr START replicas=[http://solr01-dcg:8983/solr/intradesk/] nUpdates=100
Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync
WARNING: no frame of reference to tell of we've missed updates
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: PeerSync Recovery was not successful - trying replication. core=intradesk
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=intradesk
Dec 3, 2012 9:55:35 AM org.apache.solr.client.solrj.impl.HttpClientUtil createClient
INFO: Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Dec 3, 2012 9:55:35 AM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. core=intradesk:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://solr01-dcg:8983/solr
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
    at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
    at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
    at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to http://solr01-dcg:8983 refused
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
    at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
    at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
    ... 4 more
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
    at java.net.Socket.connect(Socket.java:529)
    at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
    ... 12 more

Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... core=intradesk
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess
INFO: Running the leader process.
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=179999
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=179497
Dec 3, 2012 9:55:36 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178995
Dec 3, 2012 9:55:36 AM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178493
Dec 3, 2012 9:55:37 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=intradesk
Dec 3, 2012 9:55:37 AM org.apache.solr.client.solrj.impl.HttpClientUtil createClient
INFO: Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Dec 3, 2012 9:55:37 AM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. core=intradesk:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://solr01-dcg:8983/solr
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
    at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
    at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
    at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to http://solr01-dcg:8983 refused
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
    at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
    at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
    ... 4 more
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
    at java.net.Socket.connect(Socket.java:529)
    at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
    ... 12 more

Dec 3, 2012 9:55:37 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... core=intradesk
...



Any idea why solr stops responding?


On 11/30/2012 04:57 PM, Mark Miller wrote:
Thanks for all the detailed info!

Yes, that is confusing. One of the sore points we have while supporting both std Solr and SolrCloud mode.

In SolrCloud, every node is a Master when thinking about std Solr replication. However, as you see on the cloud page, only one of them is a *leader*. A leader is different than a master.

Being a Master when it comes to the replication handler simply means you can replicate the index to other nodes - in SolrCloud we need every node to be capable of doing that. Each shard only has one leader, but every node in your cluster will be a replication master.

- Mark


On Nov 30, 2012, at 10:32 AM, Arkadi Colson <ark...@smartbit.be> wrote:

This is my setup for solrCloud 4.0 on Tomcat 7.0.33 and zookeeper 3.4.5

hosts:
- solr01-dcg (first started)
- solr01-gs (second started so becomes replicate)

collections:
- smsc

shards:
- mydoc

zookeeper:
- on solr01-dcg
- on solr01-gs

SOLR_OPTS="-Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc -DzkClientTimeout=20000 -DzkHost=solr01-dcg:2181,solr01-gs:2181"

solr.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
  <cores adminPath="/admin/cores" zkClientTimeout="20000" hostPort="8983">
    <core schema="schema.xml" shard="shard1" instanceDir="/solr/mydoc/" name="mydoc" config="solrconfig.xml" collection="mydoc"/>
  </cores>
</solr>

I upload the config to zookeeper:
java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost solr01-dcg:2181,solr01-gs:2181 -confdir /opt/solr/conf -confname smsc

Linking the config to the collection:
java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection mydoc -zkhost solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181 -confname smsc

cloud on both hosts:

<dcddagii.png>

solr01-dcg

<hhfgdeab.png>

solr01-gs:

<daafhdef.png>
Any idea?

Thanks!

On 11/30/2012 03:15 PM, Mark Miller wrote:
On Nov 30, 2012, at 5:08 AM, Arkadi Colson <ark...@smartbit.be>
 wrote:


Hi

I've setup an simple 2 machine cloud with 1 shard, one replicator and 2 collections.Everything went fine. However when I look at the interface: 
http://localhost:8983/solr/#/coll1/replication
 is reporting the both machines are master. Did I do something wrong in my config or isit a report for manual replication configuration? Can someone else check this?

How? You don't really give anything to look at :)


Is it poossible to link 2 collections to the same conf in zookeeper?


Yes, that is no problem.

- Mark







-- 
Met vriendelijke groeten

Arkadi Colson

Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
T +32 11 64 08 80 • F +32 11 64 08 81

Reply via email to