@johnwaynee: we plan to address every issue that is technically possible
to address. Hopefully the more detailed breakdown of issues around
memory use, both leak, and could be perceived as a leak helps address
your concern.


So what I have managed to trace down so far. This isn't a single bug, but 
multiple issues. Some of them existed before 6.8 but have been exacerbated by 
the apparmor patchset (eg see issue 2).

1. There is a ref count leak in the prompt listener. I have patch for
this and it will be fixed.


2. There is a bug in the userspace policy compiler that makes policy, 
unnecessarily large. It shows up in slab monitoring mostly in kmalloc-rnd-*-8 
slabs.
I have a kernel mitigation for this that can detect and fix it as part of 
policy verification. A policy compiler fix is also in the works.

This bug exists pre 6.8, however the Ubuntu apparmor patchset,
advertises support for permstable32 v3 which triggers the compiler to
generate the erroneously large tables.


3. There is a ref count leak in complain mode due to a cycle. This can be 
manually broken, and cleaned up by explicitly removing complain mode null-XXX 
profiles, but they currently won't auto cleanup when no longer used. Manually 
you can use the aa-remove-unknown tool to clean these up but that tool may also 
break snap, and lxd policy.

A fix for this is a wip progress but it is going to take time, as it
requires rearchitecting the profile/label and label proxy interaction.

This bug has existed for a long time.


4. There is a ref count leak, that is still being chased. It may be cyclic. It 
will be fixed when the cause is understood.


Issues 5 and on are not technically leaks but are sources of increased memory 
usage, that may look like a leak. They are not the only issues, but increased 
memory usages that in some way that can be addressed.


5. The increased policy shipped with Noble and following is just causing more 
memory usage. Issue 2 exacerbates this but doesn't account for all of this. So 
I will note the larger ones that can be seen, and addressed in some way, that 
is the memory use isn't required (or at least fully) and can be addressed.

5.1 apparmorfs - the increased policy is causing a proportionally large
increase in pre-allocated inodes, and dentries. It is currently a
permanent increase, and requires a rewrite of apparmorfs to by more
dynamic. This is currently planned for but not priority for addressing
the issue seen here.

5.2 policy criu support compress - to support criu apparmor is storing
extra policy info in the kernel. It compresses it, but it is not
optimal. There is work to allow this to be done in userspace, which will
allow for better sharing and compression. Reducing the overhead of criu
support.

In addition there will be support for a runtime flag (currently only
compile) to disable criu support for machines where it isn't needed.

5.3 Increased caching - the apparmor patchset has a cache to store audit
logging information. Items in the cache can live beyond replacement.
Appear to be a leak. Atm unless prompt is in heavy use this should be
resulting in a large amount of data being used. Regardless it could be
improved.


6. This is an issue that is going to need further investigation and design. 
AppArmor labels all open fd objects. These fd object will contain references to 
profiles, and for as long as the object lives those references will pin old 
profiles in memory even after they have been replaced. This can look like a 
leak but isn't, and is part of what makes chasing issues 4 down so hard. With 
the increase in policy we are seeing a lot more of this.

It is possible we could mitigate some of the memory use here by allowing
a replaced profile to be partially freed after being replaced, if it is
pinned in this way.


7. There is some work in progress that will improve apparmor policy compression 
(this is different than issue 5.2 criu compression) which will result in less 
memory needed for policy in general.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2098730

Title:
  Kernel 6.8.0 memory leak

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Noble:
  Confirmed
Status in linux source package in Oracular:
  Confirmed
Status in linux source package in Plucky:
  Confirmed

Bug description:
  We discovered a kernel memory leak in the Ubuntu 24.04 distribution
  using the 6.8.0 series kernel while testing on our cluster.

  After performing a fresh installation of Ubuntu Server 24.04 from the
  official ISO: `https://ubuntu.com/download/server/thank-
  you?version=24.04.1&architecture=amd64&lts=true`, with the built-in
  kernel `linux-image-6.8.0-51-generic`, we observed that Slab consumes
  all available memory due to the kernel threads `kmalloc-rnd-(*)-2k`
  and `skbuff_head_cache` when the audit subsystem is enabled. The value
  of `SUnreclaim` in `/proc/meminfo` nearly reaches the system's total
  memory.

  We validated this bug on both amd64 and aarch64 architectures,
  affecting bare metal and virtual machines alike across all platforms.
  Here’s the stack information obtained using the eBPF tool `memleak`:
  "https://github.com/iovisor/bcc/blob/master/tools/memleak.py"; to
  detect the kernel memory leak:

    195362816 bytes in 5962 allocations from stack
            0xffffffff96652314      __alloc_pages+0x264 [kernel]
            0xffffffff96652314      __alloc_pages+0x264 [kernel]
            0xffffffff9665b0a8      allocate_slab+0xa8 [kernel]
            0xffffffff9665b3b8      new_slab+0x38 [kernel]
            0xffffffff9665e3d5      ___slab_alloc+0x435 [kernel]
            0xffffffff9665f62b      __kmalloc_node_track_caller+0x18b [kernel]
            0xffffffff970d8317      kmalloc_reserve+0x67 [kernel]
            0xffffffff970db4aa      __alloc_skb+0x8a [kernel]
            0xffffffff96457f98      audit_log_start+0x198 [kernel]
            0xffffffff96462103      audit_log_exit+0x433 [kernel]
            0xffffffff96462dbe      __audit_syscall_exit+0xee [kernel]
            0xffffffff963f111b      syscall_exit_work+0x12b [kernel]
            0xffffffff963f1189      syscall_exit_to_user_mode_prepare+0x39 
[kernel]
            0xffffffff97430ae1      syscall_exit_to_user_mode+0x11 [kernel]
            0xffffffff97428fec      do_syscall_64+0x8c [kernel]
            0xffffffff97600130      entry_SYSCALL_64_after_hwframe+0x78 [kernel]

  To reproduce this memory leak issue, follow these steps:
  1. Install and start auditd.
  2. Add audit rules in `/etc/audit/rules.d/audit.rules`:
    ```
    -D
    -b 8192
    -f 1
    -r 100
    -a always,exit -F arch=b64 -S openat -S truncate -S ftruncate -F 
exit=-EACCES -F auid>=1000 -F auid!=4294967295 -k access
    ```
  3. Reboot or run `augenrules --load`, then execute `auditctl -l` to verify 
that the audit rule above has been loaded.
  4. Run command `while :; do cat /proc/1/environ; done` as a normal user (uid 
>= 1000) to triger the audit events.
  5. Monitor kernel memory allocation by running either: 
    ```
    watch -d -n 1 'cat /proc/meminfo | grep -i SUnreclaim'
    ```
    or simply use 
    ```
    slabtop -s c
    ```

  We tested several kernels within the 6.8.0 series, and this bug was
  present in all of them, including when installing HWE Kernel 6.8.0 in
  Ubuntu 22.04 via `apt install linux-generic-hwe-22.04`. However, after
  installing mainline kernels v6.8.1 or higher from
  https://kernel.ubunt.com/mainline/, this bug disappears, indicating it
  may have been fixed on upstream.

  Therefore, it is essential to update your stable repository's kernel and 
refresh your ISO accordingly.
  --- 
  ProblemType: Bug
  ApportVersion: 2.28.1-0ubuntu3.3
  Architecture: arm64
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/timer', 
'/dev/snd/seq', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D0c', '/dev/snd/hwC0D0', 
'/dev/snd/controlC0', '/dev/snd/by-path'] failed with exit code 1:
  CRDA: N/A
  CasperMD5CheckResult: pass
  DistroRelease: Ubuntu 24.04
  InstallationDate: Installed on 2024-08-13 (191 days ago)
  InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Release arm64 
(20240423)
  IwConfig:
   lo        no wireless extensions.
   
   enp0s5    no wireless extensions.
  Lsusb:
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 003 Device 002: ID 203a:fffc PARALLELS Virtual Mouse
   Bus 003 Device 003: ID 203a:fffb PARALLELS Virtual Keyboard
  MachineType: Parallels International GmbH. Parallels ARM Virtual Machine
  NonfreeKernelModules: prl_fs_freeze prl_tg
  Package: linux (not installed)
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=tmux-256color
  ProcFB: 0 virtio_gpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-53-generic 
root=UUID=0c00eea2-089b-4002-90e1-ca43db4a623e ro
  ProcVersionSignature: Ubuntu 6.8.0-53.55-generic 6.8.12
  RelatedPackageVersions:
   linux-restricted-modules-6.8.0-53-generic N/A
   linux-backports-modules-6.8.0-53-generic  N/A
   linux-firmware                            20240318.git3b128b60-0ubuntu2.9
  RfKill:
   
  Tags: noble
  Uname: Linux 6.8.0-53-generic aarch64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: Mon, 23 Dec 2024 15:54:53
  dmi.bios.release: 0.1
  dmi.bios.vendor: Parallels International GmbH.
  dmi.bios.version: 20.2.0 (55872)
  dmi.board.asset.tag: None
  dmi.board.name: Parallels ARM Virtual Platform
  dmi.board.vendor: Parallels ARM Virtual Machine
  dmi.board.version: 0.1
  dmi.chassis.type: 2
  dmi.chassis.vendor: Parallels International GmbH.
  dmi.modalias: 
dmi:bvnParallelsInternationalGmbH.:bvr20.2.0(55872):bdMon,23Dec2024155453:br0.1:svnParallelsInternationalGmbH.:pnParallelsARMVirtualMachine:pvr0.1:rvnParallelsARMVirtualMachine:rnParallelsARMVirtualPlatform:rvr0.1:cvnParallelsInternationalGmbH.:ct2:cvr:skuParallels_ARM_VM:
  dmi.product.family: Parallels VM
  dmi.product.name: Parallels ARM Virtual Machine
  dmi.product.sku: Parallels_ARM_VM
  dmi.product.version: 0.1
  dmi.sys.vendor: Parallels International GmbH.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2098730/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to