Russell Stuart <russell-deb...@stuart.id.au> writes: > On Tue, 2019-10-22 at 20:21 -0700, Russ Allbery wrote:
>> This history has at least one commit per upload, although ideally has >> the package maintainer's full revision history and upstream's full >> revision history. > I understand you like the history. A lot of people do. But not > everyone values it, and I don't. The nice thing about having the Git repository is that you can choose. If you don't care about the history, make a shallow clone, and your clone can still interact with the repository with all of the standard tools. There's a good point in there, though, that a shallow clone assumes that you have a Git client interacting with a repository, which may be a good argument for not trying to provide the features that I want directly in the archive. I agree that there's a size cost to having the source format be a tarball of a Git repository with full history. I'm not sure the size cost is enough to matter, but I get why you view that with concern. For the record, I use history extensively with packages where I'm literally the only developer (Debian and upstream). I'm not going to try to convince you that you should do the same, but I will try to convince you that Debian should not hamper my use cases (which I feel is currently the case). > That's a perfectly understandable perspective from a Debian Developer. > But lets take a different perspective, or a Debian user installing > audited-crypto-program-x. What you are dismissing as "artefacts" is > exactly the information the person installing this needs to assure > themselves the Debian version of audited-crypto-program-x is a > reasonably faithful reproduction of the original. If the packaging is > done well it will be broken down into small changes, each with a > documented purpose. I don't agree with this statement. I think you're muddling auditing and reproducibility. The question reproducibility answers is "is this source package an accurate and untampered copy of the combination of upstream source and Debian packaging that the Debian package maintainer intended to put together." In other words, it's a question of supply-chain security. Checking reproducibility only back to a set of patches does *not* provide a real guarantee of reproducibility, since a supply-chain attack could still have introduced malicious code in the patch generation process. You have to trace the provenance all the way to the maintainer's working tree, which is what a signed Git tag will do. Your proposed transformation leaves a supply-chain security gap. You're talking about an audit, which is when you open the hood and determine whether you trust the Debian package maintainer or their work. I agree that's *also* important. I disagree with the assertion that a set of patches is the best format for the information required to do an audit; I'd much rather have a Git repository with full history, from which those patches and much more can be easily derived. But, either way, this is a somewhat rarer use case than reproducibility (which is ideally checked for every package continuously, since it's a security control preventing a type of supply-chain attack). > The point of defining the process of constructing the Debian source > representation as a "pure function" is to guarantee it faithfully > reflects the original source for and documented changes _only_ - not > some random crap living in stale state carried across from years ago. I understand why you want this specific artifact. I'm objecting to what feels, from my perspective, like an argument for dropping all of the features that I want and retaining only the feature that you want, when you can derive the feature that you want (at some additional complexity cost, to be sure) from the format that I'm arguing for. -- Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>