RE: sharding and placement of replicas

Oakley, Craig (NIH/NLM/NCBI) [C] Wed, 25 Jul 2018 14:49:50 -0700

I just now tried it with Solr7.4 and am getting the same symptoms as I describe 
below.

The symptoms I describe are quite different from my impression of Shawn 
Heisey's impression of my symptoms, so I will describe my symptoms again.

Let us assume that we start with a SolrCloud of two nodes: one at 
hostname1:9999 and the other at hostname2:9999

Let us assume that we have a one-shard collection with two replicas. One of the 
replicas is on the node at hostname1:9999 (with the core col_shard1_replica_n1) 
and the other on the node at hostname2:9999 (with the core 
col_shard1_replica_n3)

Then I run SPLITSHARD

I end up with four cores instead of two, as expected. The problem is that three 
of the four cores (col_shard1_0_replica_n5, col_shard1_0_replica0 and 
col_shard1_1_replica_n6) are *all on hostname1*. Only col_shard1_1_replica0 was 
placed on hostname2.

Prior to the SPLITSHARD, if hostname1 becomes temporarily unavailable, the 
SolrCloud can still be used: hostname2 has all the data.

After the SPLITSHARD, if hostname1 becomes temporarily unavailable, the 
SolrCloud does not have any access to the data in shard1_0

Granted, I could add a replica of shard1_0 onto hostname2, and I could then 
drop one of the extraneous shard1_0 replicas which are on hostname1: but I 
don't see the logic in requiring such additional steps every time.

My question is: How can I tell Solr "avoid putting two replicas of the same 
shard on the same node"?

-----Original Message-----
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, June 19, 2018 2:20 PM
To: solr-user@lucene.apache.org
Subject: Re: sharding and placement of replicas

On 6/15/2018 11:08 AM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> If I start with a collection X on two nodes with one shard and two replicas 
> (for redundancy, in case a node goes down): a node on host1 has 
> X_shard1_replica1 and a node on host2 has X_shard1_replica2: when I try 
> SPLITSHARD, I generally get X_shard1_0_replica1, X_shard1_1_replica1 and 
> X_shard1_0_replica0 all on the node on host1 with X_shard1_1_replica0 sitting 
> alone on the node on host2. If host1 were to go down at this point, shard1_0 
> would be unavailable.

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-splitshard

That documentation says "The new shards will have as many replicas as
the original shard."  That tells me that what you're seeing is not
matching the *intent* of the SPLITSHARD feature.  The fact that you get
*one* of the new shards but not the other is suspicious.  I'm wondering
if maybe Solr tried to create it but had a problem doing so.  Can you
check for errors in the solr logfile on host2?

If there's nothing about your environment that would cause a failure to
create the replica, then it might be a bug.

> Is there a way either of specifying placement or of giving hints that 
> replicas ought to be separated?

It shouldn't be necessary to give Solr any parameters for that.  All
nodes where the shard exists should get copies of the new shards when
you split it.

> I am currently running Solr6.6.0, if that is relevant.

If this is a provable and reproducible bug, and it's still a problem in
the current stable branch (next release from that will be 7.4.0), then
it will definitely be fixed.  If it's only a problem in 6.x, then I
can't guarantee that it will be fixed.  That's because the 6.x line is
in maintenance mode, which means that there's a very high bar for
changes.  In most cases, only changes that meet one of these criteria
are made in maintenance mode:

 * Fixes a security bug.
 * Fixes a MAJOR bug with no workaround.
 * Fix is a very trivial code change and not likely to introduce new bugs.

Of those criteria, generally only the first two are likely to prompt an
actual new software release.  If enough changes of the third type
accumulate, that might prompt a new release.

My personal opinion:  If this is a general problem in 6.x, it should be
fixed there.  Because there is a workaround, it would not be cause for
an immediate new release.

Thanks,
Shawn

RE: sharding and placement of replicas

Reply via email to