On Thu, 27 Jun 2024 at 19:56:43 -0700, Otto Kekäläinen wrote: > Could you point me to some Debian Bug # or otherwise share examples of > cases when a build succeeded locally but failed on official Debian > builders due to something that is specific for sbuild/schroot?
I can't easily point you to a Debian bug number, because I try to only upload packages that live up to Debian's quality standards, which means I've been routinely building packages for upload in sbuild/schroot for several years; so if a package fails in that situation, I do not upload, and retry as many times as it takes to get it right. (I'm sure I've failed to do that several times, but I'm sorry, I mostly can't remember specific instances or bug numbers; I generally try to fix the regression as quickly as I can.) But, some examples of packages and the reasons they fail: - bubblewrap, repeatedly. Its test suite wants to create new user and filesystem namespaces, which is unconditionally not allowed by the kernel while inside a chroot (because the kernel doesn't want to allow filesystem namespaces to be used to escape from a chroot). The relevant tests have to be skipped in situations where they can't work. "Real" container managers that use pivot_root() instead of chroot(), such as Docker and Podman, sometimes allow creation of nested user namespaces (like bwrap by default, and docker --privileged), sometimes deny it (like bwrap --disable-userns, and Docker by default), and sometimes cannot allow it because some larger factor forces their hand: it's non-obvious what will work. The conditions for not being allowed to create new namespaces are relatively complicated and poorly-documented, and the error reporting is minimal (two or three errno values have to cover every possible failure mode), so this is something that has to be done by trial and error. Until recently, DSA'd machines all used /proc/sys/kernel/unprivileged_userns_clone to disable unprivileged creation of user namespaces anyway. This restriction has presumably been lifted for the buildds that use sbuild in unshare mode. - xdg-desktop-portal, repeatedly. Its test suite uses FUSE, which is disabled (the module is prevented from loading) on official Debian buildds as a security hardening mechanism, even though on typical end-user or server Debian systems it works fine. This is one that I did have to find out via FTBFS, because I don't yet have a local build environment that replicates this restriction. I know that I should, and it's on my list. - ostree, at least once. The test suite historically assumed that /var/tmp supports extended attributes, which is not true on all buildds (ordinary on-disk filesystems usually do support them, but tmpfs doesn't or didn't until recently, and some buildds with plenty of RAM operate in a tmpfs root filesystem to speed up their builds). - flatpak, repeatedly. Same as bubblewrap, ostree and x-d-p, combined. - dbus, historically. For a long time, when using the non-default DBUS_COOKIE_SHA1 authentication mechanism, libdbus ignored $HOME and instead used the "official" home directory from /etc/passwd (the equivalent of `getent passwd $(id -u) | cut -d: -f6`). Official buildds set the user's home directory to /nonexistent, so this fails. In production use, dbus normally uses EXTERNAL over AF_UNIX (and doesn't even allow DBUS_COOKIE_SHA1, as a piece of security hardening), but in its build-time tests it specifically exercises each auth mechanism and each transport, including DBUS_COOKIE_SHA1 over TCP (which is a terrible idea on Unix but is unfortunately necessary on Windows). - GLib, ongoing (#972151). When the GLib test suite tests interoperability with libdbus, it (IMO reasonably!) expects ("localhost", AF_INET) to resolve to 127.0.0.1, but that doesn't work on IPv6-only buildds for relatively complicated reasons involving subtleties of glibc resolver behaviour (#952740). My local build environment still doesn't have code to reproduce this, and I'm sorry that I haven't provided workarounds or fixes in the GLib test suite or in libdbus' discouraged TCP code paths. If someone wants to work on this, skipping the interop tests for TCP on IPV6-only buildds would probably be more proportionate than adjusting libdbus' name-resolution behaviour for a feature nobody should be using in production anyway. - Any package that assumes that if $XDG_RUNTIME_DIR is set, then it is set to a usable value (because historically schroot would set it to a value that exists/works on the host system, but does not exist and cannot be created inside the container). This is worked around by individual packages unsetting XDG_RUNTIME_DIR or setting it to a more useful value, or automatically by recent debhelper in a sufficiently high compat level (#942111). > I have never run in such a situation despite doing Debian packaging > for 10 years with fairly complex C++ software targeting all archs > Debian supports. If your complex C++ software is doing pure computation without side-effects, or if it's doing something that's unaffected by being in a chroot (like file I/O to the build directory, or IPC via AF_UNIX) then it can be extremely complex and still not hit this sort of thing. Conversely, container-adjacent tools that want to run build-time tests will hit this sort of thing every time. smcv