Bug#980148: mesa-vulkan-drivers: file content conflict in Multi-Arch:same package

Simon McVittie Tue, 15 Apr 2025 05:51:14 -0700

On Mon, 14 Apr 2025 at 18:44:38 +0200, Helmut Grohne wrote:
> In general, I doubt we fix this for trixie other than dropping M-A:same
> maybe.

Please don't drop M-A: same from mesa-vulkan-drivers. From my point of
view as someone helping to make Steam be runnable on Linux:
mesa-vulkan-drivers:amd64 and mesa-vulkan-drivers:i386 need to be
co-installable, otherwise it isn't possible to run 64- and 32-bit games
that use Vulkan on the same system (which would be a regression relative
to bookworm, bullseye, and I think also buster, where this worked fine).

Or, even if proprietary software like Steam is disregarded,
mesa-vulkan-drivers:amd64 and mesa-vulkan-drivers:i386 need to be
co-installable if we want both 64- and 32-bit Wine to be able to implement
the Direct3D API using DXVK, which I believe we do.

I think a regression for amd64/i386 co-installation would have a
considerably larger practical negative impact on Debian users than
ABI conflicts between less-commonly-used architecture pairs like
armel/armhf, and a very much larger practical negative impact than
conflicts between architecture pairs involving -ports (amd64/hurd-amd64
or amd64/kfreebsd-amd64) or architectures that are not yet in Debian at
all (amd64/musl-linux-amd64).

> On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote:
> > Loaders are expected to be able to recognise that a particular driver is not
> > for them, and gracefully not load it. In practice this works fine, because 
> > all
> > of our architectures can be distinguished by their ELF headers (and if that
> > wasn't the case, multiarch co-installation of ordinary shared libraries 
> > would
> > go badly wrong).
> 
> I'm sorry to disappoint you, but reality is not like that.
...
> Then, if you combine armel and armhf, those architectures also have ELF
> headers that are mostly indistinguishable. I'm not sure what happens
> exactly, but it isn't good.
...
> So no, as long as we support armel and armhf simultaneously, we cannot
> tell architectures apart by their ELF header.

If this is a problem, surely it's a problem that we already have, whatever
Mesa might do? Because /etc/ld.so.conf.d adds all the multiarch directories
from every enabled architecture to the search path:

amdahl$ schroot -c sid_armel-dchroot cat 
/etc/ld.so.conf.d/arm-linux-gnueabi.conf
...
# Multiarch support
/usr/local/lib/arm-linux-gnueabi
/lib/arm-linux-gnueabi
/usr/lib/arm-linux-gnueabi

amdahl$ schroot -c sid_armhf-dchroot cat 
/etc/ld.so.conf.d/arm-linux-gnueabihf.conf
...
# Multiarch support
/usr/local/lib/arm-linux-gnueabihf
/lib/arm-linux-gnueabihf
/usr/lib/arm-linux-gnueabihf

and we rely on the dynamic linker to consider and reject libraries that
are not, in fact, compatible with the current process.

I don't have a mixed armel/armhf system immediately to hand right now,
but you can see this in action on a mixed amd64/i386 system. I don't have
one /etc/ld.so.cache for amd64 and a second for i386: I only have one ld.so
cache, containing both:

$ /sbin/ldconfig -Xp | grep libvulkan.so.1
        libvulkan.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libvulkan.so.1
        libvulkan.so.1 (libc6) => /lib/i386-linux-gnu/libvulkan.so.1

But when one of my installed programs asks to load libvulkan.so.1, either
via DT_NEEDED or dlopen(), ld.so knows that it must choose the one that
matches the architecture of the running process and disregard the other one.

(There is also only one LD_LIBRARY_PATH, shared between all architectures.)

Similarly I'm 95% sure that a mixed armel/armhf system only has one
ld.so.cache, listing both armel and armhf libraries indiscriminately, and
hopefully with enough metadata to choose which one is more appropriate
and disregard the other. So on a mixed armel/armhf system, there are
two possibilities:

(a) glibc/ld.so knows how to distinguish between armel and armhf libraries,
    and avoid loading armel libraries into armhf processes and vice versa.
    If this is true then loading "libvulkan_radeon.so" into an armhf process
    will reliably load the armhf flavour, avoiding the armel flavour,
    and we win.

(b) glibc/ld.so can't distinguish between armel and armhf libraries.
    But if this is true, then we will already have the problem that loading
    an ordinary library dependency like "libc.so.6" or "libvulkan.so.1" to
    satisfy DT_NEEDED can load the wrong flavour, so we have already lost,
    even before loading a Vulkan driver plugin; and I don't see how Mesa
    doing a dlopen("libvulkan_radeon.so", ...) is going to make this
    any worse.

and it seems like the same would be true for any pair of glibc
architectures? Either we're in the equivalent of situation (a) and my
"option 2" from earlier in the thread would work fine, or we're in
the equivalent of situation (b) and we already have a serious problem,
which is not going to be made noticeably worse by anything Mesa does.

In practice, it seems that ld.so *can* distinguish between armel and armhf,
presumably by distinguishing EF_ARM_ABI_FLOAT_SOFT from
EF_ARM_ABI_FLOAT_HARD in their e_flags field:

amdahl$ schroot -c sid_armel-dchroot -- /sbin/ldconfig -Xp | grep libzstd
...
        libzstd.so.1 (libc6,soft-float) => /lib/arm-linux-gnueabi/libzstd.so.1

amdahl$ schroot -c sid_armhf-dchroot -- /sbin/ldconfig -Xp | grep libzstd
...
        libzstd.so.1 (libc6,hard-float) => /lib/arm-linux-gnueabihf/libzstd.so.1

so hopefully what we get for the armel/armhf pair is (a). (And I would
expect mixed armel/armhf to already fail horribly if that was not the case.)

> You can actually run kfreebsd-amd64 binaries on a Linux kernel as their
> ELF header looks the same. Not that they do useful stuff, but they may
> go far enough as to reset your system clock.

But can you dlopen() kfreebsd-amd64 libraries into a running Linux
amd64 process? That's what matters here. If you can't, then the drivers
from mesa-vulkan-drivers:kfreebsd-amd64 will gracefully fail to load,
no harm done. Or if you can, then we likely already have worse problems.

> What also gets interesting is when you try to combine e.g. amd64 and
> musl-linux-amd64. Those also do not tell apart from their ELF header.

This is a situation where "option 2", a single JSON manifest with only
the basename of the library, might actually be *better* than "option 1",
a distinct JSON manifest per architecture with the absolute path to the
library.

Presumably musl-linux-amd64 has a library search path (either hard-coded
into it or via configuration) that is distinct from glibc's; if it didn't,
and if glibc's and musl's dynamic linkers are unable to avoid loading
libraries from the "other" ABI (scenario b above), we would already have
worse problems.

But if musl has a distinct search path, then a musl process calling
dlopen("libvulkan_radeon.so", ...), as it would if option 2 is taken,
won't load the glibc flavour of libvulkan_radeon.so, because that isn't in
its search path; and conversely, a glibc process doing the same dlopen()
call won't see the musl flavour.

However if a musl process calls
dlopen("/usr/lib/x86_64-linux-gnu/libvulkan_radeon.so", ...), as it
would if option 1 is taken, then I can see how that might accidentally
succeed if their ELF flags happen to be the same, leading to problems
when musl and glibc ABI assumptions collide.

> > 1. As far as I'm aware, the basename of these files never matters: all
> >     that matters is their content. So Mesa's debian/rules could do something
> >     like this (assuming file-rename(1p) from the rename package):
> > 
> >         file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
> >         debian/tmp/usr/share/vulkan/icd.d/*.json
> > 
> >     to replace the "x86_64" or "armv8l" part of the filename with a string
> >     that is definitely distinct for each pair of Debian architectures,
> >     resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.
> > 
> >     Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
> >     filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
> >     if we want to allow mesa-vulkan-drivers:amd64,
> >     mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
> >     to be co-installed.
...
> This sounds very reasonable to me.

But if you are concerned about the possiblity that the dynamic linker will
load the "wrong" flavour of the library, how would this help us?
There is nothing special about these filenames that makes Vulkan-Loader
load some while ignoring others: the only mechanism for ignoring
unsuitable/incompatible drivers is to dlopen() them and see if it fails.

(Vulkan-Loader *does* have a mechanism to flag drivers as 32-bit or 64-bit,
in which case the dlopen() won't even be attempted for the "wrong" word
size, but this is very limited and only works with word sizes, not the rest
of the possible differences between architectures; and in any case Mesa
doesn't currently apply this marking to its drivers.)

For example on a mixed armel/armhf system, if we did "option 1", we would
be relying on an armhf Vulkan-Loader doing something like this:

    readdir("/usr/share/vulkan/icd.d") => a list of filenames
        including e.g. radeon_icd.armel.json, radeon_icd.armhf.json,
        nouveau_icd.armhf.json and so on

    fopen(".../radeon_icd.armel.json", ...) => success
    parse JSON
    dlopen("/usr/lib/arm-linux-gnueabi/libvulkan_radeon.so", ...) => failure
        (because it's a softfloat library and we are a hardfloat process)

    fopen(".../radeon_icd.armhf.json", ...) => success
    parse JSON
    dlopen("/usr/lib/arm-linux-gnueabihf/libvulkan_radeon.so", ...) => success
    ask this driver whether it can find any GPUs that it supports
    foreach GPU in the result {
        add the GPU to our list of devices
    }

    repeat both for nouveau driver
    repeat both for lavapipe driver
    repeat both for virtio driver
    etc.

This works as intended for the most common multiarch scenarios like
amd64/i386. I suspect it also works as intended for armel/armhf, although
your assertion is that it does not.

> 
> > 2. Or, Mesa could give its Vulkan drivers the same file layout as its
> >     Vulkan layers [and]
> >     set the library_path field to be just the basename
> 
> Given what I said earlier about the inability to tell ELF headers apart
> and the real problems observed in trying to do so, I have a preference
> for the first option.

I don't see how this would introduce a new problem that we don't already
have to deal with. In "option 2", Vulkan-Loader would do something like:

    readdir("/usr/share/vulkan/icd.d") => a list of filenames
        including e.g. radeon_icd.json, nouveau_icd.json and so on

    fopen(".../radeon_icd.json", ...) => success
    parse JSON
    dlopen("libvulkan_radeon.so", ...) => success
    ask this driver whether it can find any GPUs that it supports
    foreach GPU in the result {
        add the GPU to our list of devices
    }

    repeat for nouveau driver
    repeat for lavapipe driver
    repeat for virtio driver
    etc.

But if dlopen("libvulkan_radeon.so", ...) can succeed but return a library
that is of the wrong architecture, don't we already have an equivalent
problem when evaluating DT_NEEDED dependencies, like when libGL.so.1
loads libGLdispatch.so.0, or at least when evaluating dlopen()'d weak
dependencies, like when libSDL2-2.0.so.0 loads libdbus-1.so.3? And if that
problem already exists, then the relevant architecture pair already aren't
going to work well together, and Mesa/Vulkan isn't making the problem worse.

    smcv

Bug#980148: mesa-vulkan-drivers: file content conflict in Multi-Arch:same package

Reply via email to