Hi Simon, On Thu, Apr 17, 2025 at 08:23:18PM +0200, Simon Josefsson wrote: > I noticed that Fedora 42 was released and their docker images lack a > 'awk' tool. Debian trixie images ship with 'mawk' pre-installed right > now. While I'm not convinced the removal game is necessarily a good > one, I can see that it does have some advantages. Is it possible to > drop 'mawk' from the set of default tools in trixie? If not, what are > the blockers? What is the method to find out what the blockers are?
shrinking essential/minbase/container images generally is a worthwhile goal as you saw from existing replies. What is not as useful is asking "can we drop XXX?" with little context, because (as others indicated) this is a ton of work. The way to advance these matters is doing research. One of the first aspects is what "dropping" means. Typical answers: * Removing "Essential: yes" * e2fsprogs, mount and a few more used to be essential. * Removing dependencies * apt (not essential, but close) used to depend on adduser. * Reducing the Priority value * We've been debating this for ifupdown. * Removing dependencies within the build-essential set * I recently proposed removing libcrypt-dev from build-essential. In this case, the immediate meaning must be getting it out of essential. However, that does not move it out of container images, which incurs further work and also raises the user impact (see Sean's mail). Next, there is a question of what we gain. Essential weighs in at roughly 100MB (depending on how you count it). So regarding awk, we're talking about a size reduction of about 0.3%. For comparison, being able to substitute toybox for coreutils has the potential to reduce more than 10% of size. Removing bash (keeping dash) would be around 7%. Whilst those other gains are significantly higher, their impact and effort also is. Picking a sensible candidate is the difficult part here. It leads us to analyzing the effort and impact. Being in the essential set means that dependencies are not spelled out. So the first step is locating those dependencies. As we will likely not be able to audit Debian's source code for awk uses in a reasonable amount of time, empirical methods are likely needed. * Rebuild the archive with awk dropped and see what fails * Consider using reproducible builds to additionally see what packages change as a result of dropping awk (for those that happen to be reproducible) * Search for awk usage in maintainer scripts https://binarycontrol.debian.net/?q=awk&path=unstable%2F.*%2Fp Note that postrm scripts cannot express dependencies and need to be rewritten without awk. It also means that if you assume people to always purge their packages, we may remove awk in forky+1 at best if we manage to fix all postrm in forky. * Download all Debian binary packages and search for awk uses in the installed files using regular expressions. * Run autopkgtests with awk removed Doing this is a ton of work. Doing that work and presenting the results is what makes "can we drop awk?" a useful question as it answers the cost part. This is not meant to discourage you. Quite to the contrary. Reducing implicit software dependencies has lots of other benefits such as easing architecture bootstrapping and a smaller trusted computing base. It is a topic you cannot do in a spare evening though. For instance, I'd like to propose making coreutils substitutable in essential like awk is substitutable. However, that question is not presently "useful" in the sense that it lacks a sound implementation. I've been pondering this with Jochen and Johannes back in Würzburg and now Julian has picked up the question and arrived at a promising prototype based on feedback from Guillem. I hope that we are discussing coreutils soon, but that discussion will be so much more useful when it comes with a prototype and an impact analysis. Helmut