Whojohn opened a new issue, #17840:
URL: https://github.com/apache/dolphinscheduler/issues/17840

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   # Purpose
   When Linux triggers an OOM kill, it should prioritize killing the worker's 
shell and its subclass implementations (the parts most prone to OOM), rather 
than dolp-related component modules. Additionally, it should delay the kernel 
from killing all dolp services such as worker/master as much as possible to 
ensure stability.
   
   # Version Discussion
   > dolp:3.4 (dev)
   > Reference (linux oom kill) 
   
https://www.kernel.org/doc/html/latest/filesystems/proc.html#proc-pid-oom-adj-adjust-the-oom-killer-score
   
   # Improvements
   1. **When users enable cgroup (task.resource.limit.state=true), modify the 
oom_score_adj score to 1000 in BaseLinuxShellInterceptorBuilder.**
   
   - BaseLinuxShellInterceptorBuilder#now
   
   ```
   sudo systemd-run -q --scope -p CPUQuota=1% -p MemoryLimit=512M --uid=root 
bash /tmp/xxx/971445_1934185.command
   ```
   
   - BaseLinuxShellInterceptorBuilder#after
   ```
   sudo systemd-run -q --scope -p CPUQuota=5% -p MemoryLimit=1500M --uid=root 
bash -c echo 1000 > /proc/self/oom_score_adj  && exec bash 
/tmp/xxx/971445_1934185.command
   ```
   
   2. **Modify bin/dolphinscheduler-daemon.sh, set oom_score_adj to -1000 to 
ensure it gets killed as late as possible.**
   
   - dolphinscheduler-daemon#now
   
   ```
   nohup /bin/bash "$DOLPHINSCHEDULER_HOME/$command/bin/start.sh" > /dev/null 
2> $log &
   ```
   dolphinscheduler-daemon#after
   ```
   nohup /bin/bash "$DOLPHINSCHEDULER_HOME/$command/bin/start.sh" > /dev/null 
2> log &
   echo -1000 > /proc/!/oom_score_adj
   ```
   
   # Why so design
   q1: Why ensured cgroup is open? In container/k8s environments, there should 
be write permission issues with the proc directory. Docker mode has been 
verified to be unable to start with the default image.
   q2: What's the benefit? When using shell command concatenation for startup, 
excessive concurrency will immediately trigger OOM kill. If concurrency 
increases further, it may even kill the dolp service.
   q3: Should shell types be linked with task priority? The default 
oom_score_adj of 1000 will cause all priority tasks to be killed as quickly as 
possible. This feature could be linked with task priority, but determining what 
value to configure for each priority level and how to ensure fair OOM kill is a 
difficult problem to determine, so uniformly configuring it to 1000 is simpler 
and more feasible.
   
   
   
   # 目的
   当linux发生oom kill 时候,应该优先杀死worker 的 shell以及子类实现(最容易发生oom 部分),而不是 dolp 
相关的组件模块。并且尽量迟让内核杀死 work/master 等所有 dolp 服务保障稳定。
   
   # 讨论的版本
   3.4 (dev)
   
   # reference 
   
https://www.kernel.org/doc/html/latest/filesystems/proc.html#proc-pid-oom-adj-adjust-the-oom-killer-score
   
   # 改进点
   **1. 当用户开启 cgroup (task.resource.limit.state=true)时,修改 oom_score_adj 评分。  
BaseLinuxShellInterceptorBuilder 启动相关修改 oom_score_adj 为1000 **
   
   
   BaseLinuxShellInterceptorBuilder#now
   > sudo systemd-run -q --scope -p CPUQuota=1% -p MemoryLimit=512M --uid=root 
bash /tmp/xxx/971445_1934185.command
   
   BaseLinuxShellInterceptorBuilder#after
   > sudo systemd-run -q --scope -p CPUQuota=5% -p MemoryLimit=1500M --uid=root 
bash -c echo 1000 > /proc/self/oom_score_adj  && exec bash 
/tmp/xxx/971445_1934185.command
   
   
   ** 2.  bin/dolphinscheduler-daemon.sh , 修改 oom_score_adj 为 -1000,保证尽量晚被杀死。**
   
   dolphinscheduler-daemon#now
   nohup /bin/bash "$DOLPHINSCHEDULER_HOME/$command/bin/start.sh" > /dev/null 
2> $log &
   
   dolphinscheduler-daemon# after
   nohup /bin/bash "$DOLPHINSCHEDULER_HOME/$command/bin/start.sh" > /dev/null 
2> $log &
   echo -1000 > /proc/$!/oom_score_adj
   
   
   
   
   
   # 为什么这样设计
   q1: 为什么必须要确保 cgroup 或者资源限制模式?容器/k8s 环境下 proc 目录应该会存在写入问题。 docker 
模式已经验证过无法通过默认镜像进行启动。
   q2: 效果?当使用shell 等命令行拼接启动,并发过大会立刻产生oom kill ,假如并发进一步加大,甚至会杀死 dolp 服务。
   q3: shell 类型是否考虑和任务优先级联动? oom_score_adj 默认1000 
会导致所有优先级任务都尽可能快被杀死,该功能可以考虑和任务优先级联动,但是每一个优先级应该配置数值为多少,如何保证oom kill 
公平,是一个比较难确定的问题,所以统一配置为1000 比较简单可行。
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to