Let me share the findings from my investigation.
First of all, it's important to note that there are two variables that
affect the bug's reproducibility.
The first variable is the way snapd is installed — either via a Debian
package or as a snap.
Details:
snapd-test:~# snap version
snap 2.71+ubuntu22.04
snapd 2.71+ubuntu22.04
series 16
ubuntu 22.04
kernel 6.14.11+
Problem is NOT reproducible:
snapd-test-ok:~# snap version
snap 2.71
snapd 2.71
series 16
ubuntu 22.04
kernel 6.14.11+
Together with Zygmunt Krynicki and Maciek Borzecki, we discovered that
when everything works correctly, we have:
============================
Oct 10 16:28:28 test snapd[2976]: apparmor.go:977: DEBUG: apparmor_parser
--version
Oct 10 16:28:28 test snapd[2976]: AppArmor parser version 4.0.2
Oct 10 16:28:28 test snapd[2976]: Copyright (C) 1999-2008 Novell Inc.
root@test:~# snap debug execution apparmor
apparmor-parser: /snap/snapd/25202/usr/lib/snapd/apparmor_parser
apparmor-parser-command: /snap/snapd/25202/usr/lib/snapd/apparmor_parser
--config-file /snap/snapd/25202/usr/lib/snapd/apparmor/parser.conf --base
/snap/snapd/25202/usr/lib/snapd/apparmor.d --policy-features
/snap/snapd/25202/usr/lib/snapd/apparmor.d/abi/4.0
internal: true
============================
When things start to fail, we have instead:
============================
Oct 10 16:30:29 test snapd[2419]: apparmor.go:977: DEBUG: apparmor_parser
--version
Oct 10 16:30:29 test snapd[2419]: AppArmor parser version 3.0.4
Oct 10 16:30:29 test snapd[2419]: Copyright (C) 1999-2008 Novell Inc.
Oct 10 16:30:29 test snapd[2419]: Copyright 2009-2018 Canonical Ltd.
root@test:~# snap debug execution apparmor
apparmor-parser: /usr/sbin/apparmor_parser
apparmor-parser-command: /usr/sbin/apparmor_parser --policy-features
/etc/apparmor.d/abi/3.0
internal: false
============================
The second variable is the kernel version.
I was able to reproduce the problem on the 6.14.0-33-generic kernel, while
everything works perfectly on 6.8.0-85-generic.
My first conclusion was that something changed between 6.8.0-85-generic and
6.14.0-33-generic, altering AppArmor's behavior and triggering the issue.
And I found what it was — the change in the __aa_path_perm function:
From git diff Ubuntu-6.8.0-85.85 Ubuntu-hwe-6.14-6.14.0-33.33_24.04.1
security/apparmor/file.c:
-int __aa_path_perm(const char *op, const struct cred *subj_cred,
+int __aa_path_perm(const char *op, const struct cred *subj_cred,
struct aa_profile *profile, const char *name,
u32 request, struct path_cond *cond, int flags,
struct aa_perms *perms, bool prompt)
{
- struct aa_ruleset *rules = list_first_entry(&profile->rules,
- typeof(*rules), list);
+ struct aa_ruleset *rules = profile->label.rules[0];
int e = 0;
if (profile_unconfined(profile) ||
- ((flags & PATH_SOCK_COND) && !RULE_MEDIATES_AF(rules, AF_UNIX))) //
<<< THIS
+ ((flags & PATH_SOCK_COND) && !RULE_MEDIATES_UNIX(rules)))
RULE_MEDIATES_AF() checked whether the ruleset mediates the UNIX socket
family:
1. RULE_MEDIATES(rules, AA_CLASS_NET/AA_CLASS_NET_COMPAT)
2. aa_dfa_match_len(rules->policy->dfa, state, (char *) &be_af, 2)),
However, RULE_MEDIATES_UNIX() only performs (1) and skips (2).
This changes the logical outcome in __aa_path_perm() and leads to the observed
issue.
I tried a quick (and admittedly naive) patch to restore the previous
behavior:
diff --git a/security/apparmor/file.c b/security/apparmor/file.c
index 3e080bd3b470..6ea63ab48547 100644
--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -368,7 +368,7 @@ int __aa_path_perm(const char *op, const struct cred
*subj_cred,
int e = 0;
if (profile_unconfined(profile) ||
- ((flags & PATH_SOCK_COND) && !RULE_MEDIATES_UNIX(rules)))
+ ((flags & PATH_SOCK_COND) && !RULE_MEDIATES_UNIX_MATCH_AF(rules)))
return 0;
aa_str_perms(rules->file, rules->file->start[AA_CLASS_FILE],
name, cond, perms);
diff --git a/security/apparmor/include/policy.h
b/security/apparmor/include/policy.h
index 124dd434655e..08873172a272 100644
--- a/security/apparmor/include/policy.h
+++ b/security/apparmor/include/policy.h
@@ -359,6 +359,16 @@ static inline aa_state_t RULE_MEDIATES_UNIX(struct
aa_ruleset *rules)
return state;
}
+static inline aa_state_t RULE_MEDIATES_UNIX_MATCH_AF(struct aa_ruleset *rules)
+{
+ __be16 be_af = cpu_to_be16(AF_UNIX);
+ aa_state_t state = RULE_MEDIATES_UNIX(rules);
+ if (!state) {
+ return DFA_NOMATCH;
+ }
+
+ return aa_dfa_match_len(rules->policy->dfa, state, (char *) &be_af, 2);
+}
void aa_compute_profile_mediates(struct aa_profile *profile);
static inline bool profile_mediates(struct aa_profile *profile,
--
2.43.0
With this patch applied, everything starts working again — as expected, since
it restores the pre-6.8.0-85 behavior.
(I've already sent this patch privately to John Johansen, who will determine
the proper long-term fix.)
That was the first part but the next finding was even more surprising.
As far as we know, the problem occurs only when snapd is installed from a
Debian package.
When snapd is installed as a snap, the issue does not occur.
Initially, I suspected this was due to differences in the userspace AppArmor
parser — that version 3.0.4 breaks while 4.0.2 works correctly.
But this turned out to be a wrong assumption.
Digging deeper into the kernel behavior, I found that in both AppArmor
versions, there's a denial on /run/systemd/journal/stdout,
which leads to a failed write() syscall in the daemon.activate script in a "bad
case":
https://github.com/canonical/lxd-pkg-
snap/blob/824fd43be2ebcadf3ebc9c156e2d9cdd8cc1e0fc/snapcraft/commands/daemon.activate#L30
This is "strace" utility output:
10343 <... close resumed>) = 0
10343 wait4(-1, <unfinished ...>
10343 <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) =
7177
10343 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=7177, si_uid=0,
si_status=0, si_utime=0, si_stime=0} ---
10343 rt_sigreturn({mask=[]}) = 7177
10343 wait4(-1, 0x7fff80eed20c, WNOHANG, NULL) = -1 ECHILD (No child processes)
10343 write(1, "=> Starting LXD activation\n", 27) = -1 EBADF (Bad file
descriptor)
This clearly shows EBADF: file descriptor 1 (stdout =
"/run/systemd/journal/stdout") is closed by the time write() is called.
However, when tracing the same operation with snapd installed as a snap,
everything works fine:
5909 close(3) = 0
5909 wait4(-1, <unfinished ...>
5909 <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) =
1779
5909 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1779, si_uid=0,
si_status=0, si_utime=0, si_stime=0} ---
5909 rt_sigreturn({mask=[]}) = 1779
5909 wait4(-1, 0x7ffe57f5247c, WNOHANG, NULL) = -1 ECHILD (No child processes)
5909 write(1, "=> Starting LXD activation\n", 27) = 27
Why?
In both cases, /run/systemd/journal/stdout is denied by AppArmor, and (flags &
PATH_SOCK_COND) && !RULE_MEDIATES_AF(rules, AF_UNIX) evaluates to false.
So what's the difference?
Upon further tracing, I found that for both AppArmor 3.0.4 and 4.0.2, file
descriptors 1 and 2 (both pointing to "/run/systemd/journal/stdout" socket)
are closed here:
https://github.com/torvalds/linux/blob/3a8660878839faadb4f1a6dd72c3179c1df56787/security/apparmor/file.c#L722
Which means that somebody in between opens fd=1 for us afterwards! And that's a
Golang runtime:
https://github.com/golang/go/commit/2496653d0a5c6c26b879bb5bdd135e1f7504e051#diff-0d9da4234cd98598b90f30b39b24269c71c7617cf5c47dbdc6575d1b3328aee3R64
https://github.com/golang/go/commit/d4dd1de19fcef835fca14ad8cb590dbfcf8e9859#diff-ffc4efb9537f2db29b40dbe6f8b65626bc8874c33556e7356decfdcef544c478R17-R36
The difference between snapd installed as a debian package and snapd installed
as a snap turns out to be not only in
AppArmor version, but in Golang runtime version:
root@snapd-test-ok:~# strings /snap/snapd/25202/usr/lib/snapd/snap-exec | grep
'go1\.'
go1.23.10
go1.23.10
root@snapd-test:~# strings /usr/lib/snapd/snap-exec | grep 'go1\.'
go1.18.1
go1.18.1
This completes the puzzle:
Starting from Go 1.21, the runtime automatically reopens closed standard file
descriptors with /dev/null.
So, when snapd (built with Go 1.23) runs as a snap, this masks the underlying
issue — stdout is silently replaced with /dev/null.
(It's still wrong, of course, but it hides the symptom.)
In contrast, the .deb snapd build uses Go 1.18, which doesn't do this.
As a result, fd=1 remains closed, and write(1, ...) fails with EBADF.
We still need a proper kernel-side fix, but at least we now fully
understand what's going on. :)
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to apparmor in Ubuntu.
https://bugs.launchpad.net/bugs/2127244
Title:
Nested LXD is broken with snapd 2.71+ubuntu22.04
Status in snapd:
Fix Committed
Status in apparmor package in Ubuntu:
New
Bug description:
The new snapd deb in -proposed for Ubuntu 22.04 breaks running LXD
nested in a LXD container resulting in
root@j0:~# sudo snap install --channel=5.21/stable lxd
error: cannot perform the following tasks:
- Start snap "lxd" (35624) services (systemctl command [start
snap.lxd.activate.service] failed with exit status 1: stderr:
Job for snap.lxd.activate.service failed because the control process exited
with error code.
See "systemctl status snap.lxd.activate.service" and "journalctl -xeu
snap.lxd.activate.service" for details.)
Can be reproduced with
$ multipass launch noble --name test -d 10G
test$ snap install --channel=5.21/stable lxd
test$ sudo lxd init --auto
test$ lxc launch ubuntu:j j0 -c security.nesting=true
test$ lxc shell j0
j0$ sudo snap remove --purge lxd
j0$ cat <<EOF >/etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list
# Enable Ubuntu proposed archive
deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed restricted
main multiverse universe
EOF
j0$ apt update ; apt upgrade
j0$ snap install --channel=5.21/stable lxd
We only see this on noble with kernel 6.14 when running Ubuntu 22.04
containers. Running the host with jammy and older kernels does not
show the same problem.
To manage notifications about this bug go to:
https://bugs.launchpad.net/snapd/+bug/2127244/+subscriptions
--
Mailing list: https://launchpad.net/~touch-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~touch-packages
More help : https://help.launchpad.net/ListHelp