Houston Putman created SOLR-14210:
-------------------------------------

             Summary: Introduce Node-level status handler for replicas
                 Key: SOLR-14210
                 URL: https://issues.apache.org/jira/browse/SOLR-14210
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
    Affects Versions: master (9.0), 8.5
            Reporter: Houston Putman


h2. Background

As was brought up in SOLR-13055, in order to run Solr in a more cloud-native 
way, we need some additional features around node-level healthchecks.
{quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe explained 
in 
[https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n]
 determine if a node is live and ready to serve live traffic.
{quote}
 

However there are issues around kubernetes managing it's own rolling restarts. 
With the current healthcheck setup, it's easy to envision a scenario in which 
Solr reports itself as "healthy" when all of its replicas are actually 
recovering. Therefore kubernetes, seeing a healthy pod would then go and 
restart the next Solr node. This can happen until all replicas are "recovering" 
and none are healthy. (maybe the last one restarted will be "down", but still 
there are no "active" replicas)
h2. Proposal

I propose we make an additional healthcheck handler that returns whether all 
replicas hosted by that Solr node are healthy and "active". That way we will be 
able to use the [default kubernetes rolling restart 
logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies]
 with Solr.

To add on to [Jan's point 
here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559],
 this handler should be more friendly for other Content-Types and should use 
bettter HTTP response statuses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to