[ https://issues.apache.org/jira/browse/SOLR-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000271#comment-17000271 ]
Andrzej Bialecki commented on SOLR-14123: ----------------------------------------- That's not what's happening in this code - it still registers all lost nodes, but only up to 3 remaining live nodes will attempt to actually record this fact. This is specifically designed for large clusters, to avoid all remaining nodes attempting to write the same information to ZK mostly simultaneously. > autoAddReplicas is not reliable when multiple nodes go down. > ------------------------------------------------------------ > > Key: SOLR-14123 > URL: https://issues.apache.org/jira/browse/SOLR-14123 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling > Affects Versions: 8.3 > Reporter: David Hunt > Priority: Major > Labels: autoscale > > I started noticing problems in our production environment with indexing being > blocked due to a minimum replication factor not being met. We have > autoAddReplicas triggers in place to add replicas when nodes our lost but it > doesn't seem to correctly add all replicas that have been lost when nodes are > lost. I’ve been able to reproduce this behavior consistently in a development > environment. > Repro: > # Setup a 10 node SolrCloud cluster. > # Create autoAddReplicas to trigger on nodeLost with waitFor set to 10 > minutes. > # Create 15 collections with 2 shards and 4 replicas. > # Kill 3 Solr nodes. > # 15 minutes later kill 1 more Solr node. > Results: > Monitor your shards/replicas. You’ll see some replicas added to make up for > the lost replicas but not all. An hour later many shards are still missing > replicas. > Expected: > All lost replicas should be added on the 6 remaining healthy nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org