"Sometimes for one of the sub-shards, the new leader and one of the new followers end up on the same instance"
Actually, it seems to be the case that every single time in the entire history of SPLITSHARD for one of the sub-shards, both the new leader and one of the new followers end up on the exact same instance. I asked several months ago (see below under "ATTACHED MESSAGE") whether anyone anywhere had ever seen a case where this bug did not occur, and it seems that no one has been able to provide a counterexample: I think we have to concluded that this bug is universal -----Original Message----- From: Chris Ulicny <culicny@iq.media> Sent: Wednesday, January 30, 2019 1:46 PM To: solr-user@lucene.apache.org Subject: Re: SPLITSHARD not working as expected I'm not sure what the expected behavior is. However, as of 7.4.0, it doesn't seem like there is any attempt to prevent both the new leader and follower replicas from being created on the same instance. Sometimes for one of the sub-shards, the new leader and one of the new followers end up on the same instance. We just manually end up moving them since we don't split shards very often. Best, Chris On Wed, Jan 30, 2019 at 12:46 PM Rahul Goswami <rahul196...@gmail.com> wrote: > Hello, > I have a followup question on SPLITSHARD behavior. I understand that after > a split, the leader replicas of the sub shards would reside on the same > node as the leader of the parent. However, is there an expected behavior > for the follower replicas of the sub shards as to where they will be > created post split? > > Regards, > Rahul > > > > On Wed, Jan 30, 2019 at 1:18 AM Rahul Goswami <rahul196...@gmail.com> > wrote: > > > Thanks for the reply Jan. I have been referring to documentation for > > SPLISHARD on 7.2.1 > > < > https://lucene.apache.org/solr/guide/7_2/collections-api.html#splitshard> > which > > seems to be missing some important information present in 7.6 > > < > https://lucene.apache.org/solr/guide/7_6/collections-api.html#splitshard>. > > Especially these two pieces of information.: > > "When using splitMethod=rewrite (default) you must ensure that the node > > running the leader of the parent shard has enough free disk space i.e., > > more than twice the index size, for the split to succeed " > > > > "The first replicas of resulting sub-shards will always be placed on the > > shard leader node" > > > > The idea of having an entire shard (both the replicas of it) present on > > the same node did come across as an unexpected behavior at the beginning. > > Anyway, I guess I am going to have to take care of the rebalancing with > > MOVEREPLICA following a SPLITSHARD. > > > > Thanks for the clarification. > > > > > > On Mon, Jan 28, 2019 at 3:40 AM Jan Høydahl <jan....@cominvent.com> > wrote: > > > >> This is normal. Please read > >> > https://lucene.apache.org/solr/guide/7_6/collections-api.html#splitshard > >> PS: Images won't make it to the list, but don't think you need a > >> screenshot here, what you describe is the default behaviour. > >> > >> -- > >> Jan Høydahl, search solution architect > >> Cominvent AS - www.cominvent.com > >> > >> > 28. jan. 2019 kl. 09:05 skrev Rahul Goswami <rahul196...@gmail.com>: > >> > > >> > Hello, > >> > I am using Solr 7.2.1. I created a two node example collection on the > >> same machine. Two shards with two replicas each. I then called > SPLITSHARD > >> on shard2 and expected the split shards to have one replica on each > node. > >> However I see that for shard2_1, both replicas reside on the same node. > Is > >> this a valid behavior? Unless I am missing something, this could be > >> potentially fatal. > >> > > >> > Here's the query and the cluster state post split: > >> > > >> > http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=gettingstarted&shard=shard2&waitForFinalState=true > >> < > >> > http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=gettingstarted&shard=shard2&waitForFinalState=true > > > >> > >> > > >> > > >> > > >> > Thanks, > >> > Rahul > >> > >> > ==== ATTACHED MESSAGE ==== -----Original Message----- From: Oakley, Craig (NIH/NLM/NCBI) [C] <craig.oak...@nih.gov> Sent: Wednesday, September 19, 2018 4:52 PM To: solr-user@lucene.apache.org Subject: RE: sharding and placement of replicas I am still wondering whether anyone has ever seen any examples of this actually working (has anyone ever seen any example of SPLITSHARD on a two-node SolrCloud placing replicas of the each shard on different hosts than other replicas of the same shards)? Anyone? -----Original Message----- From: Oakley, Craig (NIH/NLM/NCBI) [C] <craig.oak...@nih.gov> Sent: Friday, August 10, 2018 12:54 PM To: solr-user@lucene.apache.org Subject: RE: sharding and placement of replicas Note that I usually create collections with commands which contain (for example) solr/admin/collections?action=CREATE&name=collectest&collection.configName=collectest&numShards=1&replicationFactor=1&createNodeSet= I give one node in the createNodeSet and then ADDREPLICA to the other node. In case this were related, I now tried it a different way, using a command which contains solr/admin/collections?action=CREATE&name=collectest5&collection.configName=collectest&numShards=1&replicationFactor=2&createNodeSet= I gave both nodes in the createNodeSet in this case. It created one replica on each node (each node being on a different host at the same port). This is what I would consider the expected behavior (refraining from putting two replicas of the same one shard on the same node) After this I ran a command including solr/admin/collections?action=SPLITSHARD&collection=collectest5&shard=shard1&indent=on&async=test20180810h The result was still the same: one of the four new shards was on one node and the other three were all together on the node from which I issued this command (including putting two replicas of the same shard on the same node). I am wondering whether there are any examples of this actually working (any examples of SPLITSHARD occasionally placing replicas of the each shard on different hosts than other replicas of the same shards) -----Original Message----- From: Oakley, Craig (NIH/NLM/NCBI) [C] [mailto:craig.oak...@nih.gov] Sent: Thursday, August 09, 2018 5:08 PM To: solr-user@lucene.apache.org Subject: RE: sharding and placement of replicas Okay, I've tried again with two nodes running Solr7.4 on different hosts. Before SPLITSHARD, collectest2_shard1_replica_n1 was on the host nosqltest22, and collectest2_shard1_replica_n3 was on the host nosqltest11 After running SPLITSHARD (on the nosqltest22 node), only collectest2_shard1_0_replica0 was added to nosqltest11; nosqltest22 became the location for collectest2_shard1_0_replica_n5 and collectest2_shard1_1_replica_n6 and collectest2_shard1_1_replica0 (and so if nosqltest22 were to be down, shard1_1 would not be available). -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, July 31, 2018 5:16 PM To: solr-user <solr-user@lucene.apache.org> Subject: Re: sharding and placement of replicas Right, two JVMs on the same physical host with different ports are "different Solrs" by default. If you had two replicas per shard and both were on either Solr instance (same port) that would be unexpected. Problem is that this would have been a bug clear back in the Solr 4x days so the fact that you say you saw it on 6.6 would be unexpected. Of course if you have three replicas and two instances, I'd absolutely expect that two replicas would be on one of them for each shard. Best, Erick On Tue, Jul 31, 2018 at 12:24 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <craig.oak...@nih.gov> wrote: > In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 > comment "If this is a provable and reproducible bug, and it's still a problem > in the current stable branch"), I had only installed Solr7.4 on one host, and > so I was testing with two nodes on the same host (different port numbers). I > had previously had the same symptom when the two nodes were on different > hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two > hosts and report back. > > -----Original Message----- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Tuesday, July 31, 2018 2:26 PM > To: solr-user@lucene.apache.org > Subject: Re: sharding and placement of replicas > > On 7/27/2018 8:26 PM, Erick Erickson wrote: >> Yes with some fiddling as far as "placement rules", start here: >> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html >> >> The idea (IIUC) is that you provide a snitch" that identifies what >> "rack" the Solr instance is on and can define placement rules that >> define "don't put more than one thingy on the same rack". "Thingy" >> here is replica, shard, whatever as defined by other placement rules. > > I'd like to see an improvement in Solr's behavior when nothing has been > configured in auto-scaling or rule-based replica placement. Configuring > those things is certainly an option, but I think we can do better even > without that config. > > I believe that Solr already has some default intelligence that keeps > multiple replicas from ending up on the same *node* when possible ... I > would like this to also be aware of *hosts*. > > Craig hasn't yet indicated whether there is more than one node per host, > so I don't know whether the behavior he's seeing should be considered a bug. > > If somebody gives one machine multiple names/addresses and uses > different hostnames in their SolrCloud config for one actual host, then > it wouldn't be able to do any better than it does now, but if there are > matches in the hostname part of different entries in live_nodes, then I > think the improvement might be relatively easy. Not saying that I know > what to do, but somebody who is familiar with the Collections API code > can probably do it. > > Thanks, > Shawn >