I'm looking into #901952 (pristine-tar failing to checkout out old files with non-printable unicode characters) and think that might be solved with the attached patches, by calling tar with --null and giving it a copy of the manifest file that is unescaped and NUL-terminated.
What I haven't looked into yet is whether it will break files generated with 1.46, as that the format incompatibly without changing the version of the delta (though perhaps that can be mitigated by an additional variant at checkout time). What are your opinions on that? Is this worth trying to get into buster? Bernhard R. Link -- F8AC 04D5 0B9B 064B 3383 C3DA AFFC 96D1 151D FFDC
>From c26122199aca7c2e08a8597d700d66b8734a3ad6 Mon Sep 17 00:00:00 2001 From: "Bernhard R. Link" <brl...@debian.org> Date: Fri, 29 Mar 2019 22:32:13 +0100 Subject: [PATCH 1/4] revert writing unquoted filenames to manifest this caused the filename to be unquoted two times, break with filenames containing newlines and changed the semantics of manifest files without changing the format version --- pristine-tar | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pristine-tar b/pristine-tar index 0fe132e..fe10340 100755 --- a/pristine-tar +++ b/pristine-tar @@ -664,7 +664,7 @@ sub genmanifest { chomp; # ./ or / in the manifest just confuses tar s/^\.?\/+//; - print OUT unquote_filename($_) . "\n" if length $_; + print OUT $_ . "\n" if length $_; } close IN; close OUT; -- 2.20.1
>From 3caa71e47a23eab2b93291960c2e5d54471dd839 Mon Sep 17 00:00:00 2001 From: "Bernhard R. Link" <brl...@debian.org> Date: Fri, 29 Mar 2019 22:44:25 +0100 Subject: [PATCH 2/4] also unquote octal escape sequences returned by tar Tar also generates octal escape sequences (\123) for some unicode characters. Those were not yet unquoted. This also means when such files exist, there are no longer empty files with the incorrectly unquoted names created. This change has no effect compared to pristine-tar <= 1.45 as those created files are then ignored as they do not appear in the manifest file. Though there is a small chance it might break deltas created with 1.46 with such fules (assuming those where even able to be checked in with 1.46). --- pristine-tar | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/pristine-tar b/pristine-tar index fe10340..82256f5 100755 --- a/pristine-tar +++ b/pristine-tar @@ -362,15 +362,25 @@ Options: sub unquote_filename { my $filename = shift; - $filename =~ s/\\a/\a/g; - $filename =~ s/\\b/\b/g; - $filename =~ s/\\f/\f/g; - $filename =~ s/\\n/\n/g; - $filename =~ s/\\r/\r/g; - $filename =~ s/\\t/\t/g; - $filename =~ s/\\v/\x11/g; - $filename =~ s/\\\\/\\/g; + my $unquote_character = sub { + die "filenames with NUL bytes are not supported" if $2 eq "000"; + return pack("C", oct($2)) if defined $2; + my %map_named_escapes = ( + a => "\a", + b => "\b", + f => "\f", + n => "\n", + r => "\r", + t => "\t", + v => "\x11", + "\\" => "\\", + ); + return $map_named_escapes{$1}; + }; + # unquote by calling $unquote_character for each matched group: + # (do all in a single run, as the octal sequences might output anything) + $filename =~ s/\\([abfnrtv\\])|\\([0-7]{3})/$unquote_character->()/ge; return $filename; } -- 2.20.1
>From ce7d2f64abf077c5f91119b53d34f0071e49c5e4 Mon Sep 17 00:00:00 2001 From: "Bernhard R. Link" <brl...@debian.org> Date: Fri, 29 Mar 2019 22:56:52 +0100 Subject: [PATCH 3/4] send manifests file to tar NUL-terminated and unquoted (Closes: #901952) tar changed behavior in no longer unescaping filenames with --verbatim-files-from breaking reading all delta files and handling files with such filenames. (This also fixes #902115 properly this time). --- pristine-tar | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/pristine-tar b/pristine-tar index 82256f5..f6cea05 100755 --- a/pristine-tar +++ b/pristine-tar @@ -394,12 +394,14 @@ sub recreatetarball { my @manifest; open(IN, "<", $manifestfile) || die "$manifestfile: $!"; + open(OUT, ">", "$tempdir/manifest") || die "$tempdir/manifest: $!"; while (<IN>) { chomp; push @manifest, $_; + print OUT unquote_filename($_) . pack("C", 0); } close IN; - link($manifestfile, "$tempdir/manifest") || die "link $tempdir/manifest: $!"; + close OUT; # The manifest and source should have the same filenames, # but the manifest probably has all the files under a common @@ -526,7 +528,7 @@ sub recreatetarball_helper { 0, "--numeric-owner", "-C", "$tempdir/workdir", "--no-recursion", "--mode", - "0644", "--verbatim-files-from", + "0644", "--null", "--files-from", "$tempdir/manifest" ); if (exists $options{tar_format}) { -- 2.20.1