Hello All,

I'm working to solve an interesting problem.  The problem that I have is that when I pull a server out of the cloud (to do maintenance say) and then bring it back up, it won't automatically sync up with zookeeper and become a leader or replica for any collections that I have created while it was off-line even though I specified a number of shards or replicas higher than the number of servers that are registered with zookeeper.

Here is my setup:
External Zookeeper(v 3.3.5) Ensemble (zk1, zk2, zk3)
SolrCloud (4.0.0-BETA) with 2 shards and 2 replicas (shard1, shard2, shard1a, shard2a)

Here is the detailed scenario:
I create a new collection name 'collection2' using the collection api and specify 2 shards and 2 replicas. (curl 'http://shard1:8983/solr/admin/collections?action="">')The result of the call creates (as I would expect) 2 shards and 2 replicas.

I then push some docs into 'collection2' and I see the documents are distributed between shard1 and shard2 and are replicated to 1a and 2a. So far so good.

Now to simulate a node failure I take down shard1a while pushing some more docs into 'collection2'. Additionally while shard1a is down I also create a new collection named 'collection3' using the collections api and specify 2 shards and 2 replicas. The result of the call creates (as I would expect) 2 shards and 1 replica since shard1a is down there are not enough servers to create all of the replicas.

Before bringing backup shard1a I push some documents into 'collection3' and see the docs are distributed between shard1 and shard2 with shard2a replicating shard2. Everything looks great and working as expected. Thus far.

When I bring shard1a back on-line however, here is what I would expect to happen:
1. Shard1a registers with zookeeper, zookeeper assigns it as a replica of shard1 for 'collection2' (it knows about collection2 because it's stored in the solr.xml)
2. Shard1a asks zookeeper if there are any collections that have missing replicas, or not enough shards.
2. Zookeeper responds that 'collection3' on shard1 doesn't have a replica (remember I created the collection with 2 replicas but only one is present).
4. Shard1a creates a new core and becomes a replica for 'collection3' on shard1
5. Shard1a synchronizes with shard1 and replicates the missing documents for 'collection2' and 'collection3'.

However here is what really happens:
1. shard1a registers with zookeeper and is assigned a replica of shard1 for 'collection2'
2. shard1a synchronizes with shard1 and replicates the missing documents for 'collection2'
Nothing else happens.

How I can I make shard1a automatically become a replica or a leader for missing cores within a collection when it comes online? 

--

Jed Glazner
Sr. Software Engineer
Adobe Social

385.221.1072 (tel)
801.360.0181 (cell)
jglaz...@adobe.com

550 East Timpanogus Circle
Orem, UT 84097-6215, USA
www.adobe.com

 

Reply via email to