And to pile on Shalin's comments, there is absolutely no reason
to try to pre-configure the replica on the new node, and quite
a bit of downside as you are finding. Just add the new node
without any cores and use the ADDREPLICA command to cause
create replicas.

Best,
Erick

On Thu, Jun 4, 2015 at 8:31 PM, Shalin Shekhar Mangar
<shalinman...@gmail.com> wrote:
> Why do you stop the cluster while adding a node? This is the reason why
> this is happening. When the first node of a solr cluster starts up, it
> waits for some time to see other nodes but if it finds none then it goes
> ahead and becomes the leader. If other nodes were up and running then peer
> sync and replication recovery will make sure that the node with data
> becomes the leader. So just keep the cluster running while adding a new
> node.
>
> Also, stop relying on core discovery for setting up a node. At some point
> we will stop supporting this feature. Use the collection API to add new
> replicas.
>
> On Fri, Jun 5, 2015 at 5:01 AM, Michael Roberts <mrobe...@tableau.com>
> wrote:
>
>> Hi,
>>
>> I am seeing some unexpected behavior when adding a new machine to my
>> cluster. I am running 4.10.3.
>>
>> My setup has multiple collections, each collection has a single shard. I
>> am using core auto discovery on the hosts (my deployment mechanism ensures
>> that the directory structure is created and the core.properties file is in
>> the right place).
>>
>> To add a new machine I have to stop the cluster.
>>
>> If I add a new machine, and start the cluster, if this new machine is
>> elected leader for the shard, peer recovery fails. So, now I have a leader
>> with no content, and replicas with content. Depending on where the read
>> request is sent, I may or may not get the response I am expecting.
>>
>> 2015-06-04 14:26:09.595 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - Running the leader
>> process for shard shard1
>> 2015-06-04 14:26:09.607 -0700 (,,,) coreZkRegister-1-thread-9 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - Waiting until we see
>> more replicas up for shard shard1: total=2 found=1 timeoutin=1.14707356E15ms
>> 2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to
>> continue.
>> 2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader
>> - try and sync
>> 2015-06-04 14:26:10.115 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.update.PeerSync - PeerSync: core=domain url=
>> http://10.36.9.70:11000/solr START replicas=[
>> http://mlim:11000/solr/domain/] nUpdates=100
>> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.update.PeerSync - PeerSync: core=domain url=
>> http://10.36.9.70:11000/solr DONE.  We have no versions.  sync failed.
>> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we
>> have no versions - we can't sync in that case - we were active before, so
>> become leader anyway
>> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader:
>> http://10.36.9.70:11000/solr/domain/ shard1
>> 2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ZkController - No LogReplay needed for core=domain
>> baseURL=http://10.36.9.70:11000/solr
>> 2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ZkController - I am the leader, no recovery necessary
>>
>> This seems like a fairly common scenario. So I suspect, either I am doing
>> something incorrectly, or I have an incorrect assumption about how this is
>> supposed to work.
>>
>> Does anyone have any suggestions?
>>
>> Thanks
>>
>> Mike.
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

Reply via email to