Jonas Smedegaard wrote: > Quoting Raphael Hertzog (2020-12-17 13:16:14) > > Even if you package everything, you will never ever have the right > > combination of version of the various packages. > > What is possible to auto-compute is a coarse view of the work needed. > > In reality, most Nodejs modules declare too tight versioning for their > dependencies, and in many cases it is adequate that a module is packaged > even if not at the version declared as required. A concrete example is > "ansi-styles" which is most likely working just fine in version 4.x.
This is not at all as simple as it sounds, even on a small scale, let alone when multiplied by a few hundred dependencies. (Let's please not go on the standard tangent into complaints about the number of dependencies, because at the end of that tangent, people will still use fine-grained packages and dependencies per the standard best-practices of those communities, no matter the number or content of mails in this thread suggesting otherwise. The extremes of "package for a one-line function" are not the primary issue here; not every fine-grained dependency is that small, and the issues raised in this mail still apply whether you have 200 dependencies or 600. So let's take it as a given that packages *will* have hundreds of library dependencies, and try to make that more feasible.) Figuring out whether those dependencies are actually too specific or if they're required is a substantial amount of work by itself; the packaging metadata and dependency versions recorded upstream exist to declare the required version of dependencies, and there isn't typically a *second* way that upstream records "no, really, there's a reason for this dependency version requirement". This is hard enough in a statically typed language, where you can at least have the verification of seeing if it compiles with the older version (though the package might be relying on new semantics); with a dynamically typed language, you might not know that the older version of the dependency has caused a problem until runtime. As an upstream developer, the safest assumption when preparing your own dependencies is "well, it works with the version of the dependency I tested with, and assuming that component correctly follows semver, it should work with newer semver-compatible versions". To clarify something: I *don't* believe Debian should compromise on network access at build time. Debian package dependencies should be completely self-contained within the Debian archive. The aspect I'm concerned about here is that Debian pushes hard to force every single package to use *the same version* of a given dependency, even if the dependency has multiple incompatible versions (properly declared with different semver major numbers, equivalent to libraries with different SONAMEs). I'm not suggesting there should be 50 versions of a given library in the archive, but allowing 2-4 versions would greatly simplify packaging, and would allow such unification efforts to take place incrementally, via transitions *in the archive* and *in collaboration with upstream*, rather than *all at once before a new package can be uploaded*. (I also *completely* understand pushing back on having 2-4 versions of something like OpenSSL; that'd be a huge maintenance and security burden. That doesn't mean we couldn't have 2-4 semver-major versions of a library to emit ANSI color codes, and handle reducing that number via incremental porting in the archive rather than via prohibition in advance.) I think much of our resistance to allowing 2-4 distinct semver-major versions of a given library comes down to ELF shared libraries making it painful to have two versions of a library with distinct SONAMEs loaded at once, and while that can be worked around with symbol versioning, we've collectively experienced enough pain in such cases that we're hesitant to encourage it. Our policies have done a fair bit to mitigate that pain. But much of that pain is specific to ELF shared libraries and similar. And some of our packaging limitations are built around this (e.g. "one version of a given package at a time"), which in turn forces some of those same limitations onto ecosystems that don't share the problems that motivated those limitations in the first place. The dependency and library mechanisms of some other ecosystems, are designed to support having multiple distinct versions of libraries in the same address space, with fully automatic equivalents of symbol versioning. In Debian packaging, this issue typically results in one of three scenarios for every dependency (recursively): - Trying to port the package to work with older versions of dependencies. This incurs all of the burden mentioned above for determining if the older dependency actually suffices. On top of that, this may involve actual porting of code to not rely on the functionality of newer versions, which is very much wasted effort (that functionality was added so that it could be used, and avoiding it often entails duplicating that functionality). In some cases, such porting may render the package incompatible with newer versions (especially if porting to an older semver-major version). In most cases, such changes are something upstream will generally not care about at all, for all of these reasons. Going backwards is not the ideal direction, but people sometimes do it anyway because the alternative can be even more painful: - Trying to package a newer version of the dependency in Debian. This will often cascade recursively into multiples of the same set of problems over again, both downwards through the dependency tree for the dependencies of your dependencies, and upwards through other packages' dependency trees. Packaging distinct semver-major incompatible versions in separate packages would make this much easier and avoid recursively forward-porting all the packages depending on the same dependency, but as mentioned above, there's a noticeable resistance to packaging multiple incompatible versions of a library. And in addition, every round of such work often entails substantial archive delays, trips through NEW (which can be relatively fast with the impressive work that ftpmasters do to stay on top of it, but it still may mean repeatedly pausing your packaging work), and the risk of inconsistent pushback on incompatible requirements like "don't bundle things" versus "bundle these things together because they're tiny". - Just bundle it, skip all that pain, cross your fingers, and upload. This *is* unfortunate, and I'm not arguing that bundling is the ideal solution. Bundling results in multiple semver-compatible versions of the same library in the archive, rather than a few semver-incompatible major versions of the library. But one major reason people bundle dependencies is to skip all of the above problems. So, even assuming every package involved uses semantic versioning *perfectly*, there's a great deal of work to do. And 100% of that work has to happen *before* the first upload of the package. Right now, Debian pushes back heavily on bundling, and *also* pushes back heavily on all of the things that would solve the problems with unbundled dependencies. That isn't sustainable. If we continue to push back on bundling, we need to improve our tools and processes and policies to make it feasible to maintain unbundled packages. Otherwise, we need to build tools and processes and policies around bundled dependencies. (Those processes could still include occasional requirements for unbundling, such as for security-sensitive libraries.) I've never seen an ELF shared library package rejected on the basis of "this is a tiny library, you must bundle it together with other tiny libraries"; on the contrary, for reasons such as multiarch it's often *necessary* to split out such libraries into separate binary packages. On top of that, it's possible to do shared library transitions in unstable in several different ways, aided by the testing migration process. You can upload libfoo5, port packages individually over from libfoo4 to libfoo5, and it's potentially *acceptable* for the intermediate state of libfoo4 and libfoo5 coexisting to persist for a while, as long as you take some care to avoid linking both into the same binary (or carefully use symbol versioning, which most libraries don't). Debian Policy provides a *huge* amount of value in some of the ways it constraints software builds: requiring that all dependencies (including build dependencies) be Free Software, in restricting network access at build time, and other similar ways we maintain a self-contained archive of Free Software. However, I think there are a few specific ways we could make it easier and more common for people to *not* bundle dependencies: - End the practice of pushing back on small packages that package each dependency in one source package and one binary package. It's hard enough to solve all of these problems without also needing to throw a pile of upstream packages into one Debian package. If we have issues with the size of Packages files, let's introduce the idea of archive sections solely for self-contained build dependencies that most people don't need to have in their sources.list. But let's allow packages from *all* ecosystems, regardless of size, to be able to take advantage of at least the level of support we provide for ELF shared libraries. - Allow packages to have multiple semver-major versions in the archive simultaneously (e.g. lang-modname-4, lang-modname-5, lang-modname-6), as long as the type of package supports such coexistence. This may also require some package tooling work to allow coexistence among binaries that are commonly used as build dependencies, but a prerequisite for such work is knowing that the resulting packages will not get rejected from the archive. We can have a "should"-level policy that suggests working with various upstreams across an ecosystem to reduce the number of versions needed simultaneously, and we could have tooling to help with such transitions, but those are things we can handle incrementally in the archive. We can also have a policy about pushing back on proliferating versions of security-sensitive packages, but that should be for crypto packages or packages with a history of regular security advisories, not for the majority of packages. - Simplify the process of uploading new semver-major versions of packages, without having to wait for NEW. This is especially true if the package has already been through NEW at least once. But we need to solve the case of a new source package for the new semver-major version, as well. (For instance, perhaps if the same maintainer of the source package lang-modname-5 is uploading a source package for lang-modname-6, that package can skip NEW.) We can always file RC bugs on packages in the archive, and even remove packages later. Given all of the above improvements, it'd be much more feasible for tooling to help systematically unbundle and package dependencies, and to help manage and transition those dependencies in the archive. - Josh Triplett