On Wed, 23 Aug 2023 at 17:04:36 +0100, Ian Jackson wrote: > Simon McVittie writes ("Bug#1050001: Unwinding directory aliasing"): > > What do you consider to be the end goal of this proposal? > > My idea of a desired end state is as follows: > > /bin and /lib etc. remain directories (so there is no aliasing). All > actual files are shipped in /usr. / contains compatibility symlinks > pointing into /usr, for those files/APIs/programs where this is needed > (which is far from all of them). Eventualloy, over time, the set of > compatibility links is reduced to a mere handful.
This is not merged-/usr with the meaning used by the technical committee's past resolutions, and by most (all?) non-Debian distributions (among which Fedora and Arch were among prominent early adopters). I recognise that you don't want merged-/usr, and instead you want this non-merged-/usr layout, which shares some of the properties of merged-/usr; but it isn't the same thing, and it makes discussion and reasoning unnecessarily difficult if we use the same name for two different things, so please could you avoid the term "merged /usr" for this? > I think this is a more desirable situation than the current planned > end state, which is that /bin and /lib are symlinks. The meaning ascribed to "merged /usr" or "the /usr merge" by previous TC resolutions is exactly the layout where /bin and /lib (and so on) are symlinks. Outside Debian, that's also the layout described in documents like "The Case for the /usr Merge". I acknowledge that, whatever we choose to call it, you would prefer not to end at that state, and this is a point on which our opinions differ. > The current plan, as I understand it, is that we will fix these > problems by arranging to *always* name files by their canonical paths, > ie the ones in /usr. Using the word "canonical" is not necessarily helpful here, because there are two reasonable-but-contradictory things you might mean by it. Depending who you ask, the canonical path of /bin/sh on a system with some sort of unified /usr might be: * /usr/bin/sh, because that's the physical path as returned by realpath() (I believe this is what you mean when you say "canonical", because you're thinking in terms of the path canonicalization operation done by realpath() and similar things); * or /bin/sh, because that's the interoperable path that has worked on all Linux distributions since time immemorial (even though this might not have anything to do with the physical path) May I suggest we avoid saying "canonical" as ambiguous, and use something like "physical path" and "traditional path" for these two concepts? If by "name files" you mean references to filenames from elsewhere, then that is not the plan as I understand it (see below). If by "name files" you mean "name files in dpkg's database", then yes, I believe the current plan is that we end up with dpkg's idea of the list of installed files referring to every file by its physical path, so that for example the dpkg database contains the physical path /usr/bin/sh, even though the traditional path is /bin/sh. Another way to express this is that if you install a Debian chroot and pack it into an archive in the obvious way, what you should get (in the absence of any uses of dpkg-divert, etc.) is the union of the data.tar of all the installed packages, plus additional non-dpkg-managed files like /etc/passwd. > Naming files by their canonical names will have to be done everywhere. I would dispute that. We routinely name system-critical files by non-physical paths - for example /bin/sh is really /{usr/,}bin/dash, and /lib64/ld-linux-x86-64.so.2 is really /{usr/,}lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 - and have done so for a long time. > violations of the "use only [physical] names" rule are not only > expected, they are *necessary*: Right, and that's why that is not a rule we are following. One of the reasons that merged-/usr appeals to me personally is that it takes an entire equivalence class of bugs, and turns them into non-bugs. If merged-/usr is ubiquitous, then we don't need to expect third-party software developers to "just know" that /bin/sh and /usr/bin/perl are the traditionally "correct" paths, because /usr/bin/sh and /bin/perl become equally valid and interoperable things to put at the beginning of a script. In the state we had before bookworm, where merged-/usr was supported but not mandatory, we required Debian maintainers to be careful to refer to files by their traditional name, even though on a newly-installed Debian system with merged-/usr, the "other" name would have worked equally well; and we were also implicitly expecting upstream and third-party developers to also know that they had to use the traditional name, even if if was unnecessary in their (maybe non-Debian) environment. Of course, one of the problems with a simplifying assumption is that it's one-way: as soon as we start to rely on /bin and /usr/bin being interchangeable (reducing entropy by 1 bit per file where we would have previously distinguished), that becomes part of the interface, and it's difficult to go back to having them be non-interchangeable without regressions. Whether we as a distribution or as individuals like it or not, because of the one-way nature of a simplifying assumption, an increasing number of the upstream developers whose software we rely on are going to consider an interchangeable /bin and /usr/bin to be part of the interoperable GNU/Linux platform (inasmuch as that's a thing that exists). We can fight it, and report those assumptions as bugs whenever we find them; or we can go with it, and declare those assumptions to be actually a non-bug now. The TC opted to go with it. > However, this introduces a new implied rule: > it becomes a bug to take a filename you see in a place where the file > is being *read*, and apply it in a context where the file is going to > be *updated*. If we're writing to a path that has traditionally been part of the root filesystem, that's presumably part of the limited package set that is involved in system boot or basic/low-level debugging/recovery; otherwise, it would already have been in /usr. For packages outside this low-level set, the rule has always been simple, and still is: --prefix=/usr, or your build system's equivalent. For the low-level set, it seems to me that there are two possibilities: either you're installing a .deb, or you're going behind dpkg's back and writing directly to the filesystem. If you're installing a .deb, a prerequisite is to have built it. After the transition that Helmut is working on has finished, the QA rule that can be applied during build or in a Lintian-style post-build check is extremely simple. Pseudocode: foreach path in data.tar { if (path =~ m{^/(lib[!/]*|bin|sbin)/}) { this is bad; # warning or error, depending on preference } } (The trailing slash on the pattern allows base-files or some similar package to ship the /bin -> /usr/bin, etc. symlinks in its data.tar and still pass this check, if we want them to be "owned" in the dpkg database rather than being created "unowned" like usrmerge currently does; I believe the current plan is that they will indeed become "owned".) Or, if you're going behind dpkg's back, (a) that's no more supportable than it was before all this started, and (b) it doesn't actually matter whether you write to the canonical path or the alias, because the resulting filesystem writes will be the same, and dpkg won't know about your changes either way. > It seems to me that directory aliasing will continue to be a source of > very annoying bugs indefinitely, well after the transition is fully > complete. In another 20 years we'll still be debugging strange > installation breakage that will turn out to be due to directory > aliasing. It seems to me that *not* aliasing /bin and /usr/bin will continue to be a source of very annoying bugs indefinitely, because each path you might want to refer to will have a "right" version and a "wrong" version. Your proposal with a partial symlink farm in /bin (and /lib, etc.) would be somewhat better than the traditional situation, because using the path in /usr/bin (etc.) would always be a safe choice; but I think we can do better, by making both versions be "right" in all cases. I think we might have to agree to disagree on this, and in 20 years we can see who was more accurate. > If we had done [something resembling] usrmerge the non-aliased way, > then such a checking program would be able to detect a /-vs-/usr bug > analogous to #911225. Sure, but if we have done usrmerge the aliased way, then #911225 is hardly even a bug at all - the obsolete version of GLib would still hang around on disk for an unknown reason (dpkg bug? filesystem corruption? I still have no idea why that happened), but ldconfig arranges for the SONAME symlink to point to the newer, non-obsolete version, and the obsolete version would never get loaded. I would call "we load the correct version automatically" more robust than "we might load the wrong version and crash, but at least there was a warning first". smcv