Hi all, On Mon, Dec 15, 2025 at 08:26:51PM -0800, Otto Kekäläinen wrote: > To be better able to audit the software supply-chain I have been > thinking that we should have more git info in the changes file, namely > the git commit id it was generated from, and just in case also the git > tree id as well.
I love the idea Otto! Finally an incrementally implementable way to get more insight into where we're at with divergence between git and the archive. Awesome! Santiago, On Tue, Dec 16, 2025 at 09:39:31AM +0100, Santiago Vila wrote: > > [...] have more git info in the changes file [...] > > Your proposal would prevent lost git histories to be reconstructed > and wrong git histories to be fixed. I don't quite see how this would prevent anything. Perhaps we should think of this proposal more as a divergence monitoring system? I don't 100% understand your use case yet, can you maybe show more clearly what you mean? The linked repo doesn't have any commits by you that I saw. Keep in mind that by including the git tree-id at least pushing a 1:1 copy of the uploaded dsc back into git should (hopefully) result in the same git tree-id since tree-ids are just a deterministic hash over all the files+directories unlike the commit-id which will obviously change due to the included time+date. This should IMO cover most of what we need when it comes to git history fixups, no? > So, before implementing your idea or even thinking about it, I'd like > to see a greater effort in keeping the archive and the git histories > in sync project-wide. Right now we have no idea what the Debian wide git<>archive diffs even look like in detail or how big they are. Otto's proposal would allow us to measure this, build an automated monitoring system around it and then work on driving the size of the diffs down over time. Eg. by introducing a new "smell" https://trends.debian.net/#smells. This is how we can collectively make the effort you want to see actually happen! Adrian, On Thu, Dec 18, 2025 at 10:56:42AM +0200, Adrian Bunk wrote: > If you want to actually be able to use that for audit purposes, you > might not want to work with the maintainer-specific mess that Salsa is. > > Only debian/ or complete sources? > debian/patches/ or patches applied? > One git repository per package, or 1k packages in one git repository? > The contents of a git tag/commit does sometimes not match the > contents of the package in the archive with the matching version. While that does sound dire at first glance there are only so many workflows and we might just be able to work through a large portion of them and teach the monitoring system to recognize such divergences and (if they are innocent) reflect this in smaller and smaller diffs as support becomes better. > And a git repository might disappear, or the commit might disappear, > or the commit was never pushed anywhere. We should be able to migate this by having infra that pulls and archives the git commits right after it shows up in the archive (ala snapshots.debian.org). I could also imagine the repo/commit being acessible in this way (when the Otto's metadata fields are present) being an acceptence criteria for FTP uploads in the future to keep things consistent. > The proper solution would be if we had the git trees in the archive, > in a modern setup where the buildds are integrated in the git hosting > runner infrastructure so that the git CI tests the actual packages. Agreed. That's the best long-term solution. Personally I'm hoping the recent changes in FTP team structure will actually allow this to happen now. Working on Source format 3.0 (git) has been on my TODO list for quite a while now, but was always blocked by perceived FTP team disinterest. On Thu, Dec 18, 2025 at 09:05:43PM +0200, Adrian Bunk wrote: > the "To be better able to audit the software supply-chain" is the part > I disagreed with [...] Given what I've written above do you still think so? Gunnar, On Thu, Dec 18, 2025 at 10:26:48AM -0600, Gunnar Wolf wrote: > The points you mention are all valid. However, I support Otto's idea here — > Git repoistories might disappear, or their history might be rewritten. It > _most often_, however, does not happen — sharing the specific commit from > which a given tree was built costs us _very_ little, and can provide > important information for many use cases. Exactly! Except we can even mitigate dissapering repos see above. > Right, and I completely also support tag2upload as one of the most > important steps forward in Debian usability and modernization! Absolutely! Thanks again to Sean, Ian and all who contributed to implementing it. Best thing to happen for Debian's developer approachability in a good while <3. Guillem, On Fri, Dec 19, 2025 at 12:30:21PM +0100, Guillem Jover wrote: > On Mon, 2025-12-15 at 20:26:51 -0800, Otto Kekäläinen wrote: > > Has somebody else already been thinking about the same? Do others see > > value in this? > > [...] let me try to do a shallow pass over it (which means I might miss > stuff!), to see how this could look like. Thanks for having a look at this. > If this was to be added, I think .dsc would be the more appropriate > file, because .changes is a file that gets processed during uploads > (including binary-only ones) and its information then gets set aside. > Also the file that contains the Vcs-* fields is .dsc not .changes. > > If dpkg-source were to add that kind of information, it should be > reliable and usable. But my hunch is that this tool cannot easily > guarantee that. Things that come to mind (some of which have already > been mentioned in the thread): > > - If you keep your home under git, doing a «dpkg-source -x» under it > and then a «git rev-parse» will print an ID for a repo that has > nothing to do with the source package. I think this also means > that monorepos cannot be supported, because trying to find their > root, and not confuse it with something else it is going to be > tricky. And anything that is not going to end up as part of .dsc > (or its referenced files), cannot be validated. Agreed. Do you think a new dpkg-source option to pass the monorepo root would be a viable solution here? We don't have *that* many monorepos in Debian so I expect it would be reasonably easy to ask all of them to plumb this down to dpkg-source. > (I guess the equivalent of --git-dir=srcpkg-root/ and/or > --git-dir=srcpkg-root/debian/ should be used.) Right that should do the right thing, also turning off git's up-traversal repo discovery logic. Except you probably need /.git at the end: --git-dir=<path> [...] Specifying the location of the ".git" directory using this option (or GIT_DIR environment variable) turns off the repository discovery Possibly needs to be combined with --work-tree? Maybe not since we're not committing anything? Either way something like --git-dir=srcpkg-root/.git --work-tree=srcpkg-root/ should be fully specified. Quick test: $ mkdir -p /tmp/top/bot $ git init -C /tmp/top/bot $ git init -C /tmp/top/ $ cd /tmp/top/; touch top; git add top; git commit -m top $ cd bot/; touch bot; git add bot; git commit -m bot $ git --git-dir=/tmp/top/ log --oneline fatal: not a git repository: '/tmp/top/' $ git --git-dir=/tmp/top/.git log --oneline 5d36f8b (HEAD -> master) top > - If you do variants/equivalent of «apt source --download-only», > «dpkg-source --skip-patches -x», «git init», «git add -A», > «git commit -m Import», to avoid the mess that is dealing with > random git workflows. Then you'd get information for a local > throwaway repo. Why would anyone even be doing this in the first place? I don't quite understand the motivation. > (I guess the code should check whether there's a remote that > matches the Vcs-Git field, and whether the upstream branches > match the local one.) Right. Should be easy enough to detect by such repos not having any remotes. Alternatively infra can flag/reject such uploads later. What do you think of using the Vcs-Git<>git-remote cross-check as a policy tool in the future? Start with a warning and tighten the screws once we see the project is moving in this direction. > - The code would need to check that the repo is clean, and that's > going to be annoying to do with a mix of patches applied/unapplied > git workflows, and dpkg-source only being called to build the > source (but obviously not to extract it). I don't think this is necessary. We can moitor and enforce this at the project level, doesn't (necessarily) have to be enforced by dpkg-source locally. Do you see a problem with that? > (I guess repos with patches applied could be declared > unsupported, and then dpkg-source could check for cleanliness > before preparing the source tree and record that somewhere.) We should be able to reproduce the patches-applied repo (at least the tree-id) on the monitoring infra side, no? Even if not exactly we can measure and look at the diffs we end up with and get to work from there. Since all of this is off the security critical path to the archive the monitoring system could do all sorts of shananigans to arrive at the right sequence of workflow steps to get the hashes to reproduce. Hell we could even randomy try different things and just record what was needed if it comes to that. On Fri, Dec 19, 2025 at 01:01:35PM +0100, Guillem Jover wrote: > Hmm, I think the code would also need to track the status/hashes of > debian/patches/ and then check that these have not been modified > between --before-build and --build, which might be a bit annoying. > > As an interface I think this also has the potential for being an > unreliable generator, because I don't think it should ever fail if it > finds any unsupported state where it could not add the data. And that > would mean that starting from, say, an unclean state (patches applied) > then no git commit data gets recorded. Right, good catch. I disagree that this should never fail. I see this as another useful policy leaver. We can start with warning and then enforce it if we find no tools/workflows that need this - or none remain in use after we start this work ;-). Even if it's unreliable until this is enforced that's still useful to get us to a point where we *can* start enforcing. On Fri, Dec 19, 2025 at 12:30:21PM +0100, Guillem Jover wrote: > So, barring other problems I might have missed (and happy to hear them > if someone can come up with new ones), I guess it might not be too > onerous after all to add this kind of information for a specific set of > git workflows, but certainly not in a universal way. I think it would > also need to be added in a new field, because the way the tag2upload > ones are specified they do not allow other such generators. ACK. Happy to hear your on-board with the general idea. Thanks again Guillem for diving into the dpkg side of this :-). --Daniel
signature.asc
Description: PGP signature

