On Fri Jul 25, 2025 at 12:00 AM CEST, Ian Jackson wrote:
Thanks. This seems to be going in the right direction. Very
exciting!
Nice to hear!
I have a bunch of comments about various details etc. Please don't be
discouraged by my pickiness. It's no reflection on the quality of
your work. Getting feedback like this from me is entirely normal and
expected :-).
Don't worry at all! Please be as picky as you can be. I'm not afraid of
negative feedbacks :)
1. Added a new pristine-tar=commit_id metadata field to the signed tag
generated by git-debpush
This seems right to me.
Should this be a "critical extension" in X.509 terminology?
This would be the first such, (--quilt ought to have been but is
grandfathered) but I think probably if someone hypothetically somehow
sends this option to a t2u service where this support has not yet been
deployed (or has been withdrawn!) what should happen? Probably it
should fail.
Yeah, I'd make sense. If I have git-debpush adding pristine-tar
metadata, I'm probably expecting the t2u service to use it. But maybe
it'd make sense to make all extensions critical by default? I can't
think of a scenario where git-debpush would add something which the t2u
service could ignore.
So how about
[dgit ... !pristine-tar=... ...]
or
[dgit ... +pristine-tar=... ...]
or some such?
The new metadata item will need to be documented in tag2upload(5).
I like "!" more, as "+" to me has more of an "cumulative" feel. And, in
case we go for critical by default, I'd use "?" for
a non-critical/optional field.
2. If a pristine-tar commitid is present, the git repo is hard-reset to
it, then the tarball name is obtained from the tree, and
`pristine-tar checkout` is invoked to generate the tarball.
By "the tarball name is obtained from the tree" you mean that the orig
tarball name is obtained from the pristine-tar commit's tree.
I'm not super familiar with Git terminology, but yes.
I believe this is safe security-wise because [...]. Do you see flaws
in this reasoning?
I think this is correct from my understanding of pristine-tar.
Subject to what I write below about pristine-tar vs HEAD.
Great!
--- a/git-debpush
+++ b/git-debpush
...
+pristine_tar_info=''
+if pristine_tar_commit=$(git rev-parse --verify --quiet
'refs/heads/pristine-tar'); then
+ pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+fi
Don't we want to do this only if the current upstream version actually
has any informaation in pristine-tar ? If not, then the user didn't
import the tarball, presumably ?
Yes, makes perfect sense. I have a couple of packages where
a pristine-tar branch is present but outdated, so this would break all
of them actually :)
What happens if another maintainer did this new upstream version, and
our user hasn't got that up-to-date pristine-tar branch yet? I think
ideally we'd use the orig from the archive, but there is a risk of
lossage if the uploads come too close together - see ##1109130.
Maybe in the t2u service we should check if pristine-tar data exists in
the repo and error out if the pushed tag didn't contain pristine-tar
metadata? Because in any case, user's local builds would still be using
a different orig than the intended one, so you're effectively testing
your package against the wrong upstream code.
#**** Useful sanity checks ****
I don't know what those might be. I guess we could leave them for now
and see what things go wrong in reality. Ie, I don't think this is a
blocker.
[dgit distro=$distro split$quilt_mode_text]
-[dgit please-upload source=$source version=$version$upstream_info]
+[dgit please-upload source=$source
version=$version$upstream_info$pristine_tar_info]
We might want to linewrap this a bit more. That line is getting quite
long. I guess we could put all the upstream and pristine-tar info in
a separate line maybe?
You mean like:
[dgit distro=$distro split$quilt_mode_text]
[dgit please-upload source=$source version=$version]
[dgit $upstream_info$pristine_tar_info]
If there's no functional difference, why not.
git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit")
diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server
index f6a3716c..e797123b 100755
--- a/infra/dgit-repos-server
+++ b/infra/dgit-repos-server
This part looks correct. Except:
+ "pristinetar=$t2u_pristinetar",
I think this option should be called "pristine_tar".
Doesn't the tag2upload-obtain-origs script check that the option name
only contains [0-9a-z-] characters? I could add an underscore there.
-x git deborig "$s_u"
+if [ -n "$s_pristinetar" ]; then
+ old_head=$(git rev-parse --verify HEAD)
+ git reset --hard -- "$s_pristinetar"
I don't think this is right, is it ? We want to rewind the local
pristine-tar branch to that commit, not the HEAD. I'm assuming
pristine-tar implicitly uses refs/heads/pristine-tar rather than HEAD.
Yes, that's what I wanted to do. Didn't know about the existence of `git
update-ref` (which is what I'm supposed to use here, right?). The tool
also has a `--no-deref` option which makes me kinda nervous (having
heard of all the security issues related to symlinks) - maybe it'd make
sense to use that too?
And yes, pristine-tar looks for that ref specifically.
We should definitely have a test case simulating an attack where the
salsa pristine-tar branch has got strange stuff in it.
Yes.
+ # Only get the first tarball file matching the current package version.
+ # pristine-tar does not disallow having two tarballs for the same package
+ # version using different compression algorithms, so this will just pick a
+ # random one. It might make sense to error out instead in such cases.
We definitely don't want it to be random! Let's call this an error
for now.
Got it. For the record, `gbp export-orig` generates the tarball
mentioned in the most recent pristine-tar branch commit message. If the
messages do not contain the expected tarball version, it proceeds
assuming gzip, and if that fails, generates a new tarball with git
archive and commits it into the pristine-tar branch. Which is probably
less sensible than using a random one :)
If this turns out to be a problem in practice, and users don't want to
just git rm the unwanted info from their pristine-tar branch, then
I guess we'll need to add the tarball name to the tag or something.
Yes, but how? If there are two tarballs of the same version in
pristine-tar, the user might not even realise that. Which of the two
should be added? Should the user be asked about that by git-debpush?
It's a case which should actually never happen in practise
(pristine-tar's objective is being able of recreating the upstream
tarball, and usually there's just one of them).
Maybe we can make git-deborig check for this too, and error out while
creating the tag?
+ tarball=$(git ls-tree --name-only | while IFS= read -r file; do
If you're using git-ls-tree then you need to use the nul-separated
output format I think. Urgh. Feel free to switch implementation
language.
If I can use grep and other essential: yes stuff, shell is fine. But,
just in case, which languages are acceptable?
This part of the code needs to be super-careful and cope with all
weird and unexpected git objects. For that matter, we should probably
try to defend pristine-tar from "weird stuff" so we should probably
check that the tarball metadata file is indeed a *file*.
Sure! I haven't been *too* careful while writing this initial patch, but
I'll do better as we finalize it.
Thanks for the review! Will update the code and send a v2. Bye :)