Hi!

On Tue, 2025-05-13 at 12:58:30 +0000, Holger Levsen wrote:
> On Tue, May 13, 2025 at 02:24:38PM +0200, Guillem Jover wrote:
> > We have had reproducible source packages (barring OpenPGP signatures in
> > the .dsc files) since pretty much the same time dpkg-deb gained support
> 
> have you actually tried that?

Sure, I'd like to assume at the time this got implemented :), and also
as part of every dpkg release:

  https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/build-aux/gen-release#n147

Also ISTM that reproducibility of source packages is easier to proof
(at least from the toolchain PoV), than for binary packages, because
most of the generation is driven by the toolchain itself (as seen from
the commits I referenced in dpkg). The only variable and/or potentially
problematic part is the «debian/rules clean» and whether it has side
effects that could affect that generation.

A current test could be something like:

  ,---
  $ apt source dpkg
  $ sq verify --cleartext dpkg_1.22.18.dsc | head -n-1 > dpkg-orig.dsc
  $ cd dpkg-1.22.18
  $ dpkg-buildpackage -us -uc -S
  $ cd ..
  $ diff -u dpkg-orig.dsc dpkg_1.22.18.dsc && echo reproduced source
  reproduced source
  `---

> > > why do you think they are important?

> > For QA alone this seems important (test suites for example), but in a
> > security context, to me this seems like a rather important part TBH,
> > the foundation on which binary package reproducibility is sitting. More
> > so in scenarios such as the xz attack for example. Reviewing diffoscope
> > differences is very helpful, but in the end we need to review and modify
> > the sources, from which the binaries get derived. :)
> 
> obviously I agree that being able to reproduce the content would be nice,
> however in our tests years ago, not even that was possible, yet alone
> bit by bit (thus including timestamps).

If you recall the specifics, I'd be curious to hear them!

> I guess someone would need to actually investigate some hundred packages
> today, to see how things are really today.

Perhaps my statements were sloppy though. When I said reproducible, I
meant that the toolchain can produce them, assuming the source package
itself does not get in the way via «debian/rules clean». I didn't mean
we have 100% coverage on the Debian archive for example, where as you
point out we (well someone :) would need to practically check whether
that's the case. My assumption is that most would do, but I think it's
realistic to expect that we might find a number of packages were
«debian/rules clean» affects the source generation.

I think whether we can reproduce the same source after a full build
(so the equivalent of a twice in a row build) might perhaps be more
challenging (and I'd expect less reproducibility there), but for a
single download source + full build, we are only concerned about the
«clean» target, as the source generation is performed as the first
thing.

OTOH, I think the current reproducible infra has probably all the
data, and it might just be a matter of checking whether the unsigned
*.dsc (from build-a and build-b) match? :)

Thanks,
Guillem

Reply via email to