David Hunt created SOLR-14123:
---------------------------------

             Summary: autoAddReplicas is not reliable when multiple nodes go 
down.
                 Key: SOLR-14123
                 URL: https://issues.apache.org/jira/browse/SOLR-14123
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: AutoScaling
    Affects Versions: 8.3
            Reporter: David Hunt


I started noticing problems in our production environment with indexing being 
blocked due to a minimum replication factor not being met.  We have 
autoAddReplicas triggers in place to add replicas when nodes our lost but it 
doesn't seem to correctly add all replicas that have been lost when nodes are 
lost. I’ve been able to reproduce this behavior consistently in a development 
environment.

Repro:
 # Setup a 10 node SolrCloud cluster.
 # Create autoAddReplicas to trigger on nodeLost with waitFor set to 10 minutes.
 # Create 15 collections with 2 shards and 4 replicas.
 # Kill 3 Solr nodes.
 # 15 minutes later kill 1 more Solr node.

Results:

Monitor your shards/replicas.  You’ll see some replicas added to make up for 
the lost replicas but not all.  An hour later many shards are still missing 
replicas.

Expected:

All lost replicas should be added on the 6 remaining healthy nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to