A while ago I noticed binutils had some embedded logs in one of it's packages, which included timing information about the test suite runs which will almost certainly have differences between the different builds, even on the exact same machine:
https://bugs.debian.org/950585 My proposed patch removed the timing information and various other things, but was exactly the information wanted from these files, so was not an appropriate patch. It also became known that other key toolchain packages (e.g. gcc) also embed similar log files in the .deb packages... I have since found a few other packages that do similar things: https://tests.reproducible-builds.org/debian/issues/unstable/test_suite_logs_issue.html Obviously, this would interfere with any meaningful reproducible builds testing for any package that did something like this. Ideally metadata like this about a build should *not* be included in the .deb files themselves. I'll try to summarize and detail a bit some of the proposed strategies for resolving this issue: * output plaintext data to the build log Some of these log files are large (>13MB? per architecture, per package build) and would greatly benefit from compression... How large is too large for this approach to work? Relatively simple to implement (at least for plain text logs), but potentially stores a lot of data on the buildd infrastructure... * Selectively filter out known unreproducible files This adds complexity to the process of verification; you can't beat the simplicty of comparing checksums on two .deb files. With increased complexity comes increased opportunity for errors, as well as maintenance overhead. RPM packages, for example, embed signatures in the packages, and these need to be excluded for comparison. I vaguely recall at least one case where attempting something like this in the past and resulting in packages incorrectly being reported as reproducible when the filter was overly broad... Some nasty corner cases probably lurk down this approach... * Split build metadata into a separate .deb file Some of the similar problems of the previous, though maybe a little easier to get a reliable exclusion pattern? Wouldn't require huge toolchain changes. I would expect that such packages be not actually dependend on by any other packages, and *only* contain build metadata. Maybe named SOURCEPACKAGE-buildmetadata-unreproducible.deb ... or.... ? Not beautiful or elegant, but maybe actually achievable for bookworm release cycle? * Split build metadata into a separate file or archive Some of the debian-installer packages generate tarballs that are not .deb files and are included in the .changes files when uploading to the archive; making a similar generalized option for other packages to put build metadata into a separate artifact might be workable approach, although this would presumably require toolchain changes in dpkg and dak at the very least, and might take a couple release cycles, which is... well, debian. The possibility of bundling up .buildinfo files into this metadata too, while taking some changes in relevent dpkg, dak, etc. tooling, might in the long term be worth exploring. There was a relevent bug report in launchpad: https://bugs.launchpad.net/launchpad/+bug/1845159 This seems like the best long-term approach, but pretty much *only* a long-term approach... I'd really like to remove this hurdle to reproducible builds from some key packages like binutils and gcc, but also curious about a generalizable approach so each package needing something like this doesn't reinvent the wheel in incompatible ways... Curious to hear your thoughts! live well, vagrant p.s. please consider CCing me and/or reproducible-bui...@lists.alioth.debian.org, as I'm not subscribed to debian-devel.
signature.asc
Description: PGP signature