On Mon, 14 Apr 2025 at 18:44:38 +0200, Helmut Grohne wrote: > In general, I doubt we fix this for trixie other than dropping M-A:same > maybe.
Please don't drop M-A: same from mesa-vulkan-drivers. From my point of view as someone helping to make Steam be runnable on Linux: mesa-vulkan-drivers:amd64 and mesa-vulkan-drivers:i386 need to be co-installable, otherwise it isn't possible to run 64- and 32-bit games that use Vulkan on the same system (which would be a regression relative to bookworm, bullseye, and I think also buster, where this worked fine). Or, even if proprietary software like Steam is disregarded, mesa-vulkan-drivers:amd64 and mesa-vulkan-drivers:i386 need to be co-installable if we want both 64- and 32-bit Wine to be able to implement the Direct3D API using DXVK, which I believe we do. I think a regression for amd64/i386 co-installation would have a considerably larger practical negative impact on Debian users than ABI conflicts between less-commonly-used architecture pairs like armel/armhf, and a very much larger practical negative impact than conflicts between architecture pairs involving -ports (amd64/hurd-amd64 or amd64/kfreebsd-amd64) or architectures that are not yet in Debian at all (amd64/musl-linux-amd64). > On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote: > > Loaders are expected to be able to recognise that a particular driver is not > > for them, and gracefully not load it. In practice this works fine, because > > all > > of our architectures can be distinguished by their ELF headers (and if that > > wasn't the case, multiarch co-installation of ordinary shared libraries > > would > > go badly wrong). > > I'm sorry to disappoint you, but reality is not like that. ... > Then, if you combine armel and armhf, those architectures also have ELF > headers that are mostly indistinguishable. I'm not sure what happens > exactly, but it isn't good. ... > So no, as long as we support armel and armhf simultaneously, we cannot > tell architectures apart by their ELF header. If this is a problem, surely it's a problem that we already have, whatever Mesa might do? Because /etc/ld.so.conf.d adds all the multiarch directories from every enabled architecture to the search path: amdahl$ schroot -c sid_armel-dchroot cat /etc/ld.so.conf.d/arm-linux-gnueabi.conf ... # Multiarch support /usr/local/lib/arm-linux-gnueabi /lib/arm-linux-gnueabi /usr/lib/arm-linux-gnueabi amdahl$ schroot -c sid_armhf-dchroot cat /etc/ld.so.conf.d/arm-linux-gnueabihf.conf ... # Multiarch support /usr/local/lib/arm-linux-gnueabihf /lib/arm-linux-gnueabihf /usr/lib/arm-linux-gnueabihf and we rely on the dynamic linker to consider and reject libraries that are not, in fact, compatible with the current process. I don't have a mixed armel/armhf system immediately to hand right now, but you can see this in action on a mixed amd64/i386 system. I don't have one /etc/ld.so.cache for amd64 and a second for i386: I only have one ld.so cache, containing both: $ /sbin/ldconfig -Xp | grep libvulkan.so.1 libvulkan.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libvulkan.so.1 libvulkan.so.1 (libc6) => /lib/i386-linux-gnu/libvulkan.so.1 But when one of my installed programs asks to load libvulkan.so.1, either via DT_NEEDED or dlopen(), ld.so knows that it must choose the one that matches the architecture of the running process and disregard the other one. (There is also only one LD_LIBRARY_PATH, shared between all architectures.) Similarly I'm 95% sure that a mixed armel/armhf system only has one ld.so.cache, listing both armel and armhf libraries indiscriminately, and hopefully with enough metadata to choose which one is more appropriate and disregard the other. So on a mixed armel/armhf system, there are two possibilities: (a) glibc/ld.so knows how to distinguish between armel and armhf libraries, and avoid loading armel libraries into armhf processes and vice versa. If this is true then loading "libvulkan_radeon.so" into an armhf process will reliably load the armhf flavour, avoiding the armel flavour, and we win. (b) glibc/ld.so can't distinguish between armel and armhf libraries. But if this is true, then we will already have the problem that loading an ordinary library dependency like "libc.so.6" or "libvulkan.so.1" to satisfy DT_NEEDED can load the wrong flavour, so we have already lost, even before loading a Vulkan driver plugin; and I don't see how Mesa doing a dlopen("libvulkan_radeon.so", ...) is going to make this any worse. and it seems like the same would be true for any pair of glibc architectures? Either we're in the equivalent of situation (a) and my "option 2" from earlier in the thread would work fine, or we're in the equivalent of situation (b) and we already have a serious problem, which is not going to be made noticeably worse by anything Mesa does. In practice, it seems that ld.so *can* distinguish between armel and armhf, presumably by distinguishing EF_ARM_ABI_FLOAT_SOFT from EF_ARM_ABI_FLOAT_HARD in their e_flags field: amdahl$ schroot -c sid_armel-dchroot -- /sbin/ldconfig -Xp | grep libzstd ... libzstd.so.1 (libc6,soft-float) => /lib/arm-linux-gnueabi/libzstd.so.1 amdahl$ schroot -c sid_armhf-dchroot -- /sbin/ldconfig -Xp | grep libzstd ... libzstd.so.1 (libc6,hard-float) => /lib/arm-linux-gnueabihf/libzstd.so.1 so hopefully what we get for the armel/armhf pair is (a). (And I would expect mixed armel/armhf to already fail horribly if that was not the case.) > You can actually run kfreebsd-amd64 binaries on a Linux kernel as their > ELF header looks the same. Not that they do useful stuff, but they may > go far enough as to reset your system clock. But can you dlopen() kfreebsd-amd64 libraries into a running Linux amd64 process? That's what matters here. If you can't, then the drivers from mesa-vulkan-drivers:kfreebsd-amd64 will gracefully fail to load, no harm done. Or if you can, then we likely already have worse problems. > What also gets interesting is when you try to combine e.g. amd64 and > musl-linux-amd64. Those also do not tell apart from their ELF header. This is a situation where "option 2", a single JSON manifest with only the basename of the library, might actually be *better* than "option 1", a distinct JSON manifest per architecture with the absolute path to the library. Presumably musl-linux-amd64 has a library search path (either hard-coded into it or via configuration) that is distinct from glibc's; if it didn't, and if glibc's and musl's dynamic linkers are unable to avoid loading libraries from the "other" ABI (scenario b above), we would already have worse problems. But if musl has a distinct search path, then a musl process calling dlopen("libvulkan_radeon.so", ...), as it would if option 2 is taken, won't load the glibc flavour of libvulkan_radeon.so, because that isn't in its search path; and conversely, a glibc process doing the same dlopen() call won't see the musl flavour. However if a musl process calls dlopen("/usr/lib/x86_64-linux-gnu/libvulkan_radeon.so", ...), as it would if option 1 is taken, then I can see how that might accidentally succeed if their ELF flags happen to be the same, leading to problems when musl and glibc ABI assumptions collide. > > 1. As far as I'm aware, the basename of these files never matters: all > > that matters is their content. So Mesa's debian/rules could do something > > like this (assuming file-rename(1p) from the rename package): > > > > file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \ > > debian/tmp/usr/share/vulkan/icd.d/*.json > > > > to replace the "x86_64" or "armv8l" part of the filename with a string > > that is definitely distinct for each pair of Debian architectures, > > resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json. > > > > Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer > > filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary > > if we want to allow mesa-vulkan-drivers:amd64, > > mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64 > > to be co-installed. ... > This sounds very reasonable to me. But if you are concerned about the possiblity that the dynamic linker will load the "wrong" flavour of the library, how would this help us? There is nothing special about these filenames that makes Vulkan-Loader load some while ignoring others: the only mechanism for ignoring unsuitable/incompatible drivers is to dlopen() them and see if it fails. (Vulkan-Loader *does* have a mechanism to flag drivers as 32-bit or 64-bit, in which case the dlopen() won't even be attempted for the "wrong" word size, but this is very limited and only works with word sizes, not the rest of the possible differences between architectures; and in any case Mesa doesn't currently apply this marking to its drivers.) For example on a mixed armel/armhf system, if we did "option 1", we would be relying on an armhf Vulkan-Loader doing something like this: readdir("/usr/share/vulkan/icd.d") => a list of filenames including e.g. radeon_icd.armel.json, radeon_icd.armhf.json, nouveau_icd.armhf.json and so on fopen(".../radeon_icd.armel.json", ...) => success parse JSON dlopen("/usr/lib/arm-linux-gnueabi/libvulkan_radeon.so", ...) => failure (because it's a softfloat library and we are a hardfloat process) fopen(".../radeon_icd.armhf.json", ...) => success parse JSON dlopen("/usr/lib/arm-linux-gnueabihf/libvulkan_radeon.so", ...) => success ask this driver whether it can find any GPUs that it supports foreach GPU in the result { add the GPU to our list of devices } repeat both for nouveau driver repeat both for lavapipe driver repeat both for virtio driver etc. This works as intended for the most common multiarch scenarios like amd64/i386. I suspect it also works as intended for armel/armhf, although your assertion is that it does not. > > > 2. Or, Mesa could give its Vulkan drivers the same file layout as its > > Vulkan layers [and] > > set the library_path field to be just the basename > > Given what I said earlier about the inability to tell ELF headers apart > and the real problems observed in trying to do so, I have a preference > for the first option. I don't see how this would introduce a new problem that we don't already have to deal with. In "option 2", Vulkan-Loader would do something like: readdir("/usr/share/vulkan/icd.d") => a list of filenames including e.g. radeon_icd.json, nouveau_icd.json and so on fopen(".../radeon_icd.json", ...) => success parse JSON dlopen("libvulkan_radeon.so", ...) => success ask this driver whether it can find any GPUs that it supports foreach GPU in the result { add the GPU to our list of devices } repeat for nouveau driver repeat for lavapipe driver repeat for virtio driver etc. But if dlopen("libvulkan_radeon.so", ...) can succeed but return a library that is of the wrong architecture, don't we already have an equivalent problem when evaluating DT_NEEDED dependencies, like when libGL.so.1 loads libGLdispatch.so.0, or at least when evaluating dlopen()'d weak dependencies, like when libSDL2-2.0.so.0 loads libdbus-1.so.3? And if that problem already exists, then the relevant architecture pair already aren't going to work well together, and Mesa/Vulkan isn't making the problem worse. smcv