Looks like there's room for improvement.  I too would want the desired
state to be reflected in ZK first before attempting to make it happen.
Remove live_nodes first, then iterate the local replicas to be state=DOWN,
then close down all the things.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Mar 29, 2023 at 9:16 AM Jan Høydahl <jan....@cominvent.com> wrote:

> Hi,
>
> Trying to prevent traffic being sent to a Solr node that is going to shut
> down, to avoid interruption of service as seen from various clients.
> First part of the puzzle is signaling to any (external) load balancer to
> stop sending requests to the node.
> The other part is having SolrJ understand that the node is being stopped,
> and not routing internal requests to cores on the node.
>
> Does anyone have a good command of the Shutdown logic in Solr?
> My understanding is a bit sparse, but here's what I can see in the code:
>
> bin/solr stop will send a STOP command to Jetty's STOP_PORT with
> (not-so-secret) stop key
> Jetty starts the shutdown process, destroying all servlets and filters,
> including Solr's dispatchFilter
> Solr is notified about the shutdown through a callback in
> CoreContainerProvider.
> CoreContainerProvider#close() is called which calls CC#shutdown
> CC shuts down every core on the node and then calls zkController#preClose
> ZkController#preClose removes ephemeral live_nodes/myNode and then
> publishes down state in state.json
> Wait for shutdown of executors mm and let Jetty exit
>
> I could have got it wrong though.
>
> I was hoping that a Solr node would first publish itself as "not ready" in
> ZK before rejecting requests, but seems as this is all reversed, since
> shutdown is initiated by Jetty?
> So could we instead register our own shutdown-port in Solr, and let our
> bin/solr script trigger that one? There we could orchestrate the shutdown
> as we want:
>
> Remove live_nodes znode in ZK
> Publish itself as not ready on api/node/health handler (or a new
> api/node/ready?)
> Sleep for a few seconds (or longer with an optional &shutdownDelay
> argument to our shutdown endpoint)
> trigger server.stop() to take down Jetty and kill the servlet
>
> I filed https://issues.apache.org/jira/browse/SOLR-16722 to discuss a
> technical solution.
> The primary goal is to drain traffic right before shutting a node down,
> but it could also be designed as a generic Readiness Probe <
> https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes>
> modeled from Kubernetes?
> I'm also aware that any solr client should be prepared to hit a dead node
> due to network/power events, and retry. But it won't hurt to be graceful
> whenever we can..
>
> Happy to hear your thoughts. Is this a made-up problem?
>
> Jan

Reply via email to