On Fri Jul 25, 2025 at 12:00 AM CEST, Ian Jackson wrote:
Thanks.   This seems to be going in the right direction.  Very
exciting!

Nice to hear!

I have a bunch of comments about various details etc. Please don't be discouraged by my pickiness. It's no reflection on the quality of your work. Getting feedback like this from me is entirely normal and expected :-).

Don't worry at all! Please be as picky as you can be. I'm not afraid of negative feedbacks :)

1. Added a new pristine-tar=commit_id metadata field to the signed tag generated by git-debpush

This seems right to me.

Should this be a "critical extension" in X.509 terminology?

This would be the first such, (--quilt ought to have been but is grandfathered) but I think probably if someone hypothetically somehow sends this option to a t2u service where this support has not yet been deployed (or has been withdrawn!) what should happen? Probably it should fail.

Yeah, I'd make sense. If I have git-debpush adding pristine-tar metadata, I'm probably expecting the t2u service to use it. But maybe it'd make sense to make all extensions critical by default? I can't think of a scenario where git-debpush would add something which the t2u service could ignore.

So how about
  [dgit ... !pristine-tar=... ...]
or
  [dgit ... +pristine-tar=... ...]
or some such?

The new metadata item will need to be documented in tag2upload(5).

I like "!" more, as "+" to me has more of an "cumulative" feel. And, in case we go for critical by default, I'd use "?" for a non-critical/optional field.

2. If a pristine-tar commitid is present, the git repo is hard-reset to it, then the tarball name is obtained from the tree, and
   `pristine-tar checkout` is invoked to generate the tarball.

By "the tarball name is obtained from the tree" you mean that the orig
tarball name is obtained from the pristine-tar commit's tree.

I'm not super familiar with Git terminology, but yes.

I believe this is safe security-wise because [...]. Do you see flaws in this reasoning?

I think this is correct from my understanding of pristine-tar.
Subject to what I write below about pristine-tar vs HEAD.

Great!

--- a/git-debpush
+++ b/git-debpush
...
+pristine_tar_info=''
+if pristine_tar_commit=$(git rev-parse --verify --quiet 
'refs/heads/pristine-tar'); then
+    pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+fi

Don't we want to do this only if the current upstream version actually
has any informaation in pristine-tar ?  If not, then the user didn't
import the tarball, presumably ?

Yes, makes perfect sense. I have a couple of packages where a pristine-tar branch is present but outdated, so this would break all of them actually :)

What happens if another maintainer did this new upstream version, and our user hasn't got that up-to-date pristine-tar branch yet? I think ideally we'd use the orig from the archive, but there is a risk of lossage if the uploads come too close together - see ##1109130.

Maybe in the t2u service we should check if pristine-tar data exists in the repo and error out if the pushed tag didn't contain pristine-tar metadata? Because in any case, user's local builds would still be using a different orig than the intended one, so you're effectively testing your package against the wrong upstream code.

 #**** Useful sanity checks ****

I don't know what those might be.  I guess we could leave them for now
and see what things go wrong in reality.  Ie, I don't think this is a
blocker.

 [dgit distro=$distro split$quilt_mode_text]
-[dgit please-upload source=$source version=$version$upstream_info]
+[dgit please-upload source=$source 
version=$version$upstream_info$pristine_tar_info]

We might want to linewrap this a bit more.  That line is getting quite
long.  I guess we could put all the upstream and pristine-tar info in
a separate line maybe?

You mean like:

   [dgit distro=$distro split$quilt_mode_text]
   [dgit please-upload source=$source version=$version]
   [dgit $upstream_info$pristine_tar_info]

If there's no functional difference, why not.

 git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit")
diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server
index f6a3716c..e797123b 100755
--- a/infra/dgit-repos-server
+++ b/infra/dgit-repos-server

This part looks correct.  Except:

+       "pristinetar=$t2u_pristinetar",

I think this option should be called "pristine_tar".

Doesn't the tag2upload-obtain-origs script check that the option name only contains [0-9a-z-] characters? I could add an underscore there.

-x git deborig "$s_u"
+if [ -n "$s_pristinetar" ]; then
+    old_head=$(git rev-parse --verify HEAD)
+    git reset --hard -- "$s_pristinetar"

I don't think this is right, is it ?  We want to rewind the local
pristine-tar branch to that commit, not the HEAD.  I'm assuming
pristine-tar implicitly uses refs/heads/pristine-tar rather than HEAD.

Yes, that's what I wanted to do. Didn't know about the existence of `git update-ref` (which is what I'm supposed to use here, right?). The tool also has a `--no-deref` option which makes me kinda nervous (having heard of all the security issues related to symlinks) - maybe it'd make sense to use that too?

And yes, pristine-tar looks for that ref specifically.

We should definitely have a test case simulating an attack where the salsa pristine-tar branch has got strange stuff in it.

Yes.

+    # Only get the first tarball file matching the current package version.
+    # pristine-tar does not disallow having two tarballs for the same package
+    # version using different compression algorithms, so this will just pick a
+    # random one. It might make sense to error out instead in such cases.

We definitely don't want it to be random! Let's call this an error for now.

Got it. For the record, `gbp export-orig` generates the tarball mentioned in the most recent pristine-tar branch commit message. If the messages do not contain the expected tarball version, it proceeds assuming gzip, and if that fails, generates a new tarball with git archive and commits it into the pristine-tar branch. Which is probably less sensible than using a random one :)

If this turns out to be a problem in practice, and users don't want to just git rm the unwanted info from their pristine-tar branch, then I guess we'll need to add the tarball name to the tag or something.

Yes, but how? If there are two tarballs of the same version in pristine-tar, the user might not even realise that. Which of the two should be added? Should the user be asked about that by git-debpush? It's a case which should actually never happen in practise (pristine-tar's objective is being able of recreating the upstream tarball, and usually there's just one of them).

Maybe we can make git-deborig check for this too, and error out while creating the tag?

+    tarball=$(git ls-tree --name-only | while IFS= read -r file; do

If you're using git-ls-tree then you need to use the nul-separated output format I think. Urgh. Feel free to switch implementation language.

If I can use grep and other essential: yes stuff, shell is fine. But, just in case, which languages are acceptable?

This part of the code needs to be super-careful and cope with all weird and unexpected git objects. For that matter, we should probably try to defend pristine-tar from "weird stuff" so we should probably check that the tarball metadata file is indeed a *file*.

Sure! I haven't been *too* careful while writing this initial patch, but I'll do better as we finalize it.

Thanks for the review! Will update the code and send a v2. Bye :)

Reply via email to