The Problem
------------

The problem is obviously system instability due to memory pressure, but
what can an admin do about it?

Some options exist to configure the priority of processes killed due to
memory pressure.

Non-snap processes can be configured via systemd using OOMScoreAdjust[1]
(kernel OOM[2]) and ManagedOOMMemoryPressureLimit[3][4] (userspace
OOM[5]).

Snap has no support for configuring userspace OOM priority.
Snap has a setting[6] for setting kernel OOM priority. While the setting may be 
logical within the snap ecosystem, it has deficiencies when integrating with 
the rest of Ubuntu:

1) This setting provides no ability to set specific kernel OOM scores, which is 
required to set priority with respect to non-snap processes.
2) The current user interface does not allow a user to __increase__ the 
likelihood of cgroup being killed (please kill my browser before a component of 
my desktop).
3) The default snap setting makes snap-based processes (and snapd itself) less 
likely to be killed than core system services on which snap processes depend - 
a priority inversion.

Proposed solution
-----------------

Ubuntu ships with both DEBs and snaps, therefore a solution that allows
snaps to configure OOM priority with respect to the rest of the OS would
be best.

In the near term, providing snap users with the ability to set the
values of ManagedOOMMemoryPressureLimit and OOMScoreAdjust for snaps
would empower users to tune their system to achieve better stability
under memory pressure.

In the long term in order to provide the best system behavior under
memory pressure it would benefit Ubuntu to ship with more appropriate
defaults (desktop / server personality, snaps using a more sensible
default).

Definitions
-----------

[1] OOMScoreAdjust - a systemd directive which systemd uses to set the the 
kernel OOM score[7], defaults to 0
[2] Kernel OOM - action taken when the kernel runs out of memory to allocate 
and memory reclaim hasn't returned enough memory - kernel kills a processes 
based on a metric derived from the OOM score and amount of memory used by each 
process
[3][4] ManagedOOMMemoryPressureLimit - a systemd threshold used by 
systemd-oomd, it represents the fraction of time in a 10 second window in which 
all tasks in the control group were delayed - defaults to 60%
[5] systemd-oomd - a system service that uses cgroups-v2 and pressure stall 
information (PSI[8]) to monitor and take corrective action before an OOM occurs 
in the kernel space
[6] `snap set system resilience.vitality-hint=snapA,snapB,snapC`
[7] /proc/<pid>/oom_score_adj - a procfs file containing OOM score of a process
[8] PSI (Pressure Stall Information) - counters exported to user-space which 
indicate memory pressure - available since linux version 4.20

References
----------

[1] 
https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#OOMScoreAdjust=
[2] https://www.kernel.org/doc/gorman/html/understand/understand016.html
[3] 
https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html#ManagedOOMMemoryPressureLimit=
[4] https://www.freedesktop.org/software/systemd/man/latest/oomd.conf.html#
[5] 
https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html
[6] https://snapcraft.io/docs/system-options#heading--resilience
[7] https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj.5.html

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2089800

Title:
  Ubuntu desktop is unstable under memory pressure due to undesireable
  OOMScoreAdjust values

To manage notifications about this bug go to:
https://bugs.launchpad.net/lxd/+bug/2089800/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to