Re: [D] Intermittent SIGTERM running on K8S [airflow]

via GitHub Sat, 22 Feb 2025 04:05:48 -0800


GitHub user arkadiusz-bach added a comment to the discussion: Intermittent 
SIGTERM running on K8S


@b32n , @mschueler do you have `.Values.workers.safeToEvict` set to `false` ? 
Cluster Autoscaler enabled in k8s?

Not so long ago by default it was set to 'true', 8 months ago it was changed to 
false: 
[here](https://github.com/apache/airflow/commit/4c9f12d2bce5af0c629e9b9e9916cd525817931d)

- The pod may receive SIGTERM when Autoscaler is trying to remove the node 
because it is underutilized.  
- It may behave differently on different clouds:
  - Azure - Autoscaler has its own terminationGracePeriod 
(`max-graceful-termination-sec`, see this 
[link](max-graceful-termination-sec)). It ignores terminationGracePeriod 
defined on the POD. By default it is set to 600seconds. So when autoscaler is 
trying to downscale a node it sends SIGTERM to pods in a node, if pods could 
not shutdown in 600seconds they are stopped, even though you could have 
terminationGracePerion on pod equal to 7200seconds
  - EKS - I couldn't find something similar to `max-graceful-termination-sec` 
within EKS Autoscaler 
[settings](https://docs.aws.amazon.com/eks/latest/best-practices/cas.html#_additional_parameters),
 probably it is respecting terminationGracePeriod defined on the pod.
  - GKE - didn't check 

GitHub link: 
https://github.com/apache/airflow/discussions/32543#discussioncomment-12285453

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Intermittent SIGTERM running on K8S [airflow]

Reply via email to