Control: tags -1 + confirmed On Sun, Apr 28, 2024 at 10:59:14PM +0200, Johannes Schauer Marin Rodrigues wrote: > Quoting Aurelien Jarno (2024-04-28 15:57:29) > > When running sbuild in unshare chroot mode, it is not possible to write to > > /dev/stdout: > > > > | echo test > /dev/stdout > > | sh: 1: cannot create /dev/stdout: Permission denied > > > > This is the reason of the FTBFS of at least clisp and supervisor when using > > the unshare chroot mode of sbuild.
Jochen asked me to look into this. Let me write down what I have for the benefit of the next person dumping brain cells into it. I used bookworm's sbuild to reproduce it using the supervisor package and that readily reproduced. I added an execute_before_dh_auto_test with a few diagnostics: | ls -la /dev/stdout | lrwxrwxrwx 1 root root 15 Apr 30 07:36 /dev/stdout -> /proc/self/fd/1 | ls -la /proc/self/fd | total 0 | dr-x------ 2 helmut sbuild 0 Apr 30 07:36 . | dr-xr-xr-x 9 helmut sbuild 0 Apr 30 07:36 .. | lrwx------ 1 helmut sbuild 64 Apr 30 07:36 0 -> /dev/null | l-wx------ 1 helmut sbuild 64 Apr 30 07:36 1 -> pipe:[135566170] | l-wx------ 1 helmut sbuild 64 Apr 30 07:36 2 -> pipe:[135566170] | lr-x------ 1 helmut sbuild 64 Apr 30 07:36 3 -> /proc/123/fd | echo hello > /proc/self/fd/1 | /bin/sh: 1: cannot create /proc/self/fd/1: Permission denied I also added --anything-failed-commands=%SBUILD_SHELL and there things look different. | # ls -la /proc/self/fd/1 | l-wx------ 1 root root 64 Apr 30 07:44 /proc/self/fd/1 -> /dev/tty | # runuser -u helmut bash | $ ls -la /proc/self/fd/1 | l-wx------ 1 helmut sbuild 64 Apr 30 07:48 /proc/self/fd/1 -> /dev/tty Running supervisor's test suite succeeds here. Quite certainly, the cause is connected to that pipe. The pipe in question is connecting the build log to a process that filters the build log and replaces PKGBUILDDIR and stuff. As far as I understand it, the crucial bit is that this process runs outside of the namespace. To confirm this hypothesis, I tried the following override: | override_dh_auto_test: | dh_auto_test | cat In essence, I am placing another process (cat) inside the namespace such that the stdout pipe of the test resides fully inside the namespace and cat is responsible for writing to the pipe outside without going via /proc/self/fd. With this modification, the build works again. > This works in podman. So somehow it's possible to connect /dev/stdout in a way > which preserves its intended functionality. Probably it would be useful to > find > out how podman does this. For what its worth, mmdebstrap itself suffers from > the same problem, so whatever fix is used in sbuild should probably also be > added to mmdebstrap. This does not work in podman https://github.com/containers/podman/issues/16870 nor on docker https://github.com/moby/moby/issues/31243. It sometimes works and that sometimes is when you run it interactively and thus stdout points to a tty device. As soon as it is a that pipe thingy, it fails. This is actually something I researched more deeply a while ago without success. I was trying to open a regular file in the initial namespace, inherit the open file across unshare into a user and mount namespace and then open /proc/self/fd/N. Likewise, I get -EACCES there in the very same way. Some part of permission management prevents this kind of (intentional) leakage of file descriptors, but I cannot tell which or why. The lesson learned seems to be that when you run a container workload, your stdout or stderr should either connect to a tty or to a process that lives inside your namespace (not sure which of them). It also seems possible to change permission of those pipes https://github.com/containers/conmon/pull/112 but I do not understand what it means to do so and whether that technically is a good idea. If you chmod(0666, *STDOUT); right before unsharing in Sbuild/Utility.pm, the supervisor test also passes, but this can also have undesired effects if stdout is connected to a regular file. So we really should check that STDOUT is a pipe before doing so. There is protection in the sense that /proc/self/fd by default is mode 0500. I also note that posix says that fchmod should return -EINVAL when it is performed on a pipe, so doing this very much is a linux-ism (but namespaces already are). To see whether stdout is a pipe, we may fstat it and figure out whether its st_mode has S_IFIFO. In perl, that's: use Fcntl ':mode'; ... if (((stat(*STDOUT))[2] & S_IFMT) == S_IFIFO); Going deeper with research, think this is actually not a namespace problem. https://groups.google.com/g/fa.linux.kernel/c/WVFgFngkJZw indicates a very similar problem with doing setuid. We can emulate this locally and reproduce the failure unshare -U --map-auto -S 0 -G 0 sh -c \ '/sbin/runuser -u daemon -- sh -c ": >/proc/self/fd/1" | cat' noting that the use of unshare here is purely added for the benefit of running the test code unprivileged. You can also just paste the shell part into a regular root shell in the initial namespace and have it exhibit -EACCES in the very same way. It is probably worth noting that the end of a pipe bears quite some resemblence with a file on Linux. It has owner, group, permission, timestamps and stuff. You can inspect using unshare -U --map-auto -S 0 -G 0 sh -c \ '/sbin/runuser -u daemon -- python3 -c "import os;print(os.fstat(1))" | cat' and also drop the "| cat" for comparison. So given that we can only access the pipe via its fd number or /proc/PID/fd/N and that /proc/PID/fd is mode 0500, the chmod is probably safe and the alternative would be using fchown to assign the write end to the build user. I hope this helps in constructing a solution and also is an enlightening read on what goes on behind the curtain. Helmut