On Thu, 27 Jun 2024 at 19:56:43 -0700, Otto Kekäläinen wrote:
> Could you point me to some Debian Bug # or otherwise share examples of
> cases when a build succeeded locally but failed on official Debian
> builders due to something that is specific for sbuild/schroot?

I can't easily point you to a Debian bug number, because I try to only
upload packages that live up to Debian's quality standards, which means
I've been routinely building packages for upload in sbuild/schroot for
several years; so if a package fails in that situation, I do not upload,
and retry as many times as it takes to get it right.

(I'm sure I've failed to do that several times, but I'm sorry, I mostly
can't remember specific instances or bug numbers; I generally try to fix
the regression as quickly as I can.)

But, some examples of packages and the reasons they fail:

- bubblewrap, repeatedly. Its test suite wants to create new user
  and filesystem namespaces, which is unconditionally not allowed by
  the kernel while inside a chroot (because the kernel doesn't want to
  allow filesystem namespaces to be used to escape from a chroot). The
  relevant tests have to be skipped in situations where they can't work.

  "Real" container managers that use pivot_root() instead of chroot(),
  such as Docker and Podman, sometimes allow creation of nested user
  namespaces (like bwrap by default, and docker --privileged), sometimes
  deny it (like bwrap --disable-userns, and Docker by default), and
  sometimes cannot allow it because some larger factor forces their hand:
  it's non-obvious what will work.

  The conditions for not being allowed to create new namespaces are
  relatively complicated and poorly-documented, and the error reporting is
  minimal (two or three errno values have to cover every possible failure
  mode), so this is something that has to be done by trial and error.

  Until recently, DSA'd machines all used
  /proc/sys/kernel/unprivileged_userns_clone to disable unprivileged
  creation of user namespaces anyway. This restriction has presumably
  been lifted for the buildds that use sbuild in unshare mode.

- xdg-desktop-portal, repeatedly. Its test suite uses FUSE, which is
  disabled (the module is prevented from loading) on official Debian
  buildds as a security hardening mechanism, even though on typical
  end-user or server Debian systems it works fine.

  This is one that I did have to find out via FTBFS, because I don't yet
  have a local build environment that replicates this restriction. I know
  that I should, and it's on my list.

- ostree, at least once. The test suite historically assumed that /var/tmp
  supports extended attributes, which is not true on all buildds (ordinary
  on-disk filesystems usually do support them, but tmpfs doesn't or didn't
  until recently, and some buildds with plenty of RAM operate in a tmpfs
  root filesystem to speed up their builds).

- flatpak, repeatedly. Same as bubblewrap, ostree and x-d-p, combined.

- dbus, historically. For a long time, when using the non-default
  DBUS_COOKIE_SHA1 authentication mechanism, libdbus ignored $HOME and
  instead used the "official" home directory from /etc/passwd
  (the equivalent of `getent passwd $(id -u) | cut -d: -f6`). Official
  buildds set the user's home directory to /nonexistent, so this fails.
  In production use, dbus normally uses EXTERNAL over AF_UNIX (and doesn't
  even allow DBUS_COOKIE_SHA1, as a piece of security hardening), but in
  its build-time tests it specifically exercises each auth mechanism and
  each transport, including DBUS_COOKIE_SHA1 over TCP (which is a
  terrible idea on Unix but is unfortunately necessary on Windows).

- GLib, ongoing (#972151). When the GLib test suite tests interoperability
  with libdbus, it (IMO reasonably!) expects ("localhost", AF_INET) to
  resolve to 127.0.0.1, but that doesn't work on IPv6-only buildds for
  relatively complicated reasons involving subtleties of glibc resolver
  behaviour (#952740). My local build environment still doesn't have code
  to reproduce this, and I'm sorry that I haven't provided workarounds or
  fixes in the GLib test suite or in libdbus' discouraged TCP code paths.
  If someone wants to work on this, skipping the interop tests for TCP on
  IPV6-only buildds would probably be more proportionate than adjusting
  libdbus' name-resolution behaviour for a feature nobody should be
  using in production anyway.

- Any package that assumes that if $XDG_RUNTIME_DIR is set, then it is
  set to a usable value (because historically schroot would set it to
  a value that exists/works on the host system, but does not exist and
  cannot be created inside the container). This is worked around by
  individual packages unsetting XDG_RUNTIME_DIR or setting it to a more
  useful value, or automatically by recent debhelper in a sufficiently
  high compat level (#942111).

> I have never run in such a situation despite doing Debian packaging
> for 10 years with fairly complex C++ software targeting all archs
> Debian supports.

If your complex C++ software is doing pure computation without
side-effects, or if it's doing something that's unaffected by being in
a chroot (like file I/O to the build directory, or IPC via AF_UNIX)
then it can be extremely complex and still not hit this sort of thing.
Conversely, container-adjacent tools that want to run build-time tests
will hit this sort of thing every time.

    smcv

Reply via email to