On Tue, Dec 22, 2020 at 01:29:44PM +0000, Deri wrote: > Please can someone explain why reproducible builds are important.
Hi, It's probably best to simply refer you to https://reproducible-builds.org/ for general motivation, and then you can come back with any questions you have. > What is the output of groff we should be testing. Without loss of generality: the files that end up being distributed in packaged form to users. > Since these are essentially source files which are intended to be run > at some point, diffing them just tells us that there has been a change > in grops or gropdf, not whether that output, when run, has changed. It's not a problem if a change in grops or gropdf (or whatever) induces a change in the output: this is to be expected for almost any software. The point is rather that you should be able to install the same versions of the various bits of software involved in the build toolchain, construct a suitably-documented build environment, and get bit-for-bit identical output. Whether the software involved is groff or TeX or gcc or an artisanally-crafted pile of Python or whatever is immaterial: if you can reproduce a build and produce bit-for-bit identical output, then that helps to assure that the build infrastructure that produced the binary packages you're using is sound. For example, if somebody has replaced gropdf on some bit of build infrastructure with gropdf-but-insert-evil-attack, then that can be noticed quite easily if gropdf would ordinarily produce bit-for-bit identical results across multiple runs. But if gropdf inserts extra information from its environment into the output, then the problem becomes more difficult: now you have to work out how to filter out the "expected" differences, and that problem is compounded if what you're looking at isn't a pair of PDF files but rather a pair of .debs or RPMs or MSI files or whatever that contain some PDFs somewhere inside them. Bear in mind that this is the sort of problem that people want to tackle in bulk at the scale of a whole software distribution, not at the level of comparing individual rendered PDF files by hand. Now, there is absolutely room for debate and compromise on exactly what sorts of environmental constraints one needs to apply when reproducing a build, hence things like https://reproducible-builds.org/docs/source-date-epoch/ and working out to what extent timezones should be taken into consideration. As I mentioned I'm certainly open to the possibility that when I patched Debian's groff to in some sense force TZ=UTC I did so at the wrong layer. But I hope this explains why at least the principle is important. -- Colin Watson (he/him) [cjwat...@debian.org]