Shalin Shekhar Mangar created SOLR-14942:
--------------------------------------------

             Summary: Reduce leader election time on node shutdown
                 Key: SOLR-14942
                 URL: https://issues.apache.org/jira/browse/SOLR-14942
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 8.6.3, 7.7.3
            Reporter: Shalin Shekhar Mangar
            Assignee: Shalin Shekhar Mangar


The credit for this issue and investigation belongs to [~caomanhdat]. I am 
merely reporting the issue and creating PRs based on his work.

The shutdown process waits for all replicas/cores to be closed before removing 
the election node of the leader. This can take some time due to index flush or 
merge activities on the leader cores and delays new leaders from being elected.

This process happens at CoreContainer.shutdown():
# zkController.preClose(): remove current node from live_node and change states 
of all cores in this node to DOWN state. Assuming that the current node hosting 
a leader of a shard, the shard becomes leaderless after calling this method, 
since the state of the leader is DOWN now. The leader election process is not 
triggered for the shard since the election node is still on-hold by the current 
node.
# Waiting for all cores to be loaded (if there are any).
# SolrCores.close(): close all cores.
# zkController.close(): this is where all ephemeral nodes are removed from ZK 
which include election nodes created by this node. Therefore other replicas in 
the shard can take part in the leader election from now.

Note that CoreContainer.shutdown() is invoked when Jetty/Solr nodes receive 
SIGTERM signal. 

On receiving SIGTERM, Jetty will also stop accepting new connections and new 
requests. This is a very important factor, since even if the leader replica is 
ACTIVE and its node in live_nodes, the shard will be considered as leaderless 
if no-one can index to that shard. Therefore shards become leaderless as soon 
as the node (which contains shard’s leader) receives SIGTERM.

Therefore the longer time step 1, 2 and 3 needed to finish, the longer shards 
remain leaderless. The time needed for step 3 scales with the number of cores 
so the more cores a node has, the worse. This time is spent in 
IndexWriter.close() where the system will 
# Flush all pending updates to disk
# Waiting for all merge finish (this most likely is the meaty part)

The shutdown process is proposed to changed to:
# Wait for all in-flight indexing requests and replication requests to complete
# Remove election nodes
# Close all replicas/cores

This ensures that index flush or merges do not block new leader elections 
anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to