Hi! On Tue, 2025-05-13 at 12:58:30 +0000, Holger Levsen wrote: > On Tue, May 13, 2025 at 02:24:38PM +0200, Guillem Jover wrote: > > We have had reproducible source packages (barring OpenPGP signatures in > > the .dsc files) since pretty much the same time dpkg-deb gained support > > have you actually tried that?
Sure, I'd like to assume at the time this got implemented :), and also as part of every dpkg release: https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/build-aux/gen-release#n147 Also ISTM that reproducibility of source packages is easier to proof (at least from the toolchain PoV), than for binary packages, because most of the generation is driven by the toolchain itself (as seen from the commits I referenced in dpkg). The only variable and/or potentially problematic part is the «debian/rules clean» and whether it has side effects that could affect that generation. A current test could be something like: ,--- $ apt source dpkg $ sq verify --cleartext dpkg_1.22.18.dsc | head -n-1 > dpkg-orig.dsc $ cd dpkg-1.22.18 $ dpkg-buildpackage -us -uc -S $ cd .. $ diff -u dpkg-orig.dsc dpkg_1.22.18.dsc && echo reproduced source reproduced source `--- > > > why do you think they are important? > > For QA alone this seems important (test suites for example), but in a > > security context, to me this seems like a rather important part TBH, > > the foundation on which binary package reproducibility is sitting. More > > so in scenarios such as the xz attack for example. Reviewing diffoscope > > differences is very helpful, but in the end we need to review and modify > > the sources, from which the binaries get derived. :) > > obviously I agree that being able to reproduce the content would be nice, > however in our tests years ago, not even that was possible, yet alone > bit by bit (thus including timestamps). If you recall the specifics, I'd be curious to hear them! > I guess someone would need to actually investigate some hundred packages > today, to see how things are really today. Perhaps my statements were sloppy though. When I said reproducible, I meant that the toolchain can produce them, assuming the source package itself does not get in the way via «debian/rules clean». I didn't mean we have 100% coverage on the Debian archive for example, where as you point out we (well someone :) would need to practically check whether that's the case. My assumption is that most would do, but I think it's realistic to expect that we might find a number of packages were «debian/rules clean» affects the source generation. I think whether we can reproduce the same source after a full build (so the equivalent of a twice in a row build) might perhaps be more challenging (and I'd expect less reproducibility there), but for a single download source + full build, we are only concerned about the «clean» target, as the source generation is performed as the first thing. OTOH, I think the current reproducible infra has probably all the data, and it might just be a matter of checking whether the unsigned *.dsc (from build-a and build-b) match? :) Thanks, Guillem