I'm looking into #901952 (pristine-tar failing to checkout out old files
with non-printable unicode characters) and think that might be solved
with the attached patches, by calling tar with --null and giving it
a copy of the manifest file that is unescaped and NUL-terminated.

What I haven't looked into yet is whether it will break files
generated with 1.46, as that the format incompatibly without
changing the version of the delta (though perhaps that can be
mitigated by an additional variant at checkout time).

What are your opinions on that? Is this worth trying to get into buster?

        Bernhard R. Link
-- 
F8AC 04D5 0B9B 064B 3383  C3DA AFFC 96D1 151D FFDC
>From c26122199aca7c2e08a8597d700d66b8734a3ad6 Mon Sep 17 00:00:00 2001
From: "Bernhard R. Link" <brl...@debian.org>
Date: Fri, 29 Mar 2019 22:32:13 +0100
Subject: [PATCH 1/4] revert writing unquoted filenames to manifest

this caused the filename to be unquoted two times,
break with filenames containing newlines
and changed the semantics of manifest files without changing the format version
---
 pristine-tar | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pristine-tar b/pristine-tar
index 0fe132e..fe10340 100755
--- a/pristine-tar
+++ b/pristine-tar
@@ -664,7 +664,7 @@ sub genmanifest {
     chomp;
     # ./ or / in the manifest just confuses tar
     s/^\.?\/+//;
-    print OUT unquote_filename($_) . "\n" if length $_;
+    print OUT $_ . "\n" if length $_;
   }
   close IN;
   close OUT;
-- 
2.20.1

>From 3caa71e47a23eab2b93291960c2e5d54471dd839 Mon Sep 17 00:00:00 2001
From: "Bernhard R. Link" <brl...@debian.org>
Date: Fri, 29 Mar 2019 22:44:25 +0100
Subject: [PATCH 2/4] also unquote octal escape sequences returned by tar

Tar also generates octal escape sequences (\123) for some unicode
characters. Those were not yet unquoted.

This also means when such files exist, there are no longer empty
files with the incorrectly unquoted names created.
This change has no effect compared to pristine-tar <= 1.45
as those created files are then ignored as they do not appear
in the manifest file. Though there is a small chance it might
break deltas created with 1.46 with such fules (assuming those
where even able to be checked in with 1.46).
---
 pristine-tar | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/pristine-tar b/pristine-tar
index fe10340..82256f5 100755
--- a/pristine-tar
+++ b/pristine-tar
@@ -362,15 +362,25 @@ Options:
 sub unquote_filename {
   my $filename = shift;
 
-  $filename =~ s/\\a/\a/g;
-  $filename =~ s/\\b/\b/g;
-  $filename =~ s/\\f/\f/g;
-  $filename =~ s/\\n/\n/g;
-  $filename =~ s/\\r/\r/g;
-  $filename =~ s/\\t/\t/g;
-  $filename =~ s/\\v/\x11/g;
-  $filename =~ s/\\\\/\\/g;
+  my $unquote_character = sub {
+    die "filenames with NUL bytes are not supported" if $2 eq "000";
+    return pack("C", oct($2)) if defined $2;
+    my %map_named_escapes = (
+        a => "\a",
+        b => "\b",
+        f => "\f",
+        n => "\n",
+        r => "\r",
+        t => "\t",
+        v => "\x11",
+        "\\" => "\\",
+    );
+    return $map_named_escapes{$1};
+  };
 
+  # unquote by calling $unquote_character for each matched group:
+  # (do all in a single run, as the octal sequences might output anything)
+  $filename =~ s/\\([abfnrtv\\])|\\([0-7]{3})/$unquote_character->()/ge;
   return $filename;
 }
 
-- 
2.20.1

>From ce7d2f64abf077c5f91119b53d34f0071e49c5e4 Mon Sep 17 00:00:00 2001
From: "Bernhard R. Link" <brl...@debian.org>
Date: Fri, 29 Mar 2019 22:56:52 +0100
Subject: [PATCH 3/4] send manifests file to tar NUL-terminated and unquoted
 (Closes: #901952)

tar changed behavior in no longer unescaping filenames with --verbatim-files-from
breaking reading all delta files and handling files with such filenames.

(This also fixes #902115 properly this time).
---
 pristine-tar | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/pristine-tar b/pristine-tar
index 82256f5..f6cea05 100755
--- a/pristine-tar
+++ b/pristine-tar
@@ -394,12 +394,14 @@ sub recreatetarball {
 
   my @manifest;
   open(IN, "<", $manifestfile) || die "$manifestfile: $!";
+  open(OUT, ">", "$tempdir/manifest") || die "$tempdir/manifest: $!";
   while (<IN>) {
     chomp;
     push @manifest, $_;
+    print OUT unquote_filename($_) . pack("C", 0);
   }
   close IN;
-  link($manifestfile, "$tempdir/manifest") || die "link $tempdir/manifest: $!";
+  close OUT;
 
   # The manifest and source should have the same filenames,
   # but the manifest probably has all the files under a common
@@ -526,7 +528,7 @@ sub recreatetarball_helper {
     0,                "--numeric-owner",
     "-C",             "$tempdir/workdir",
     "--no-recursion", "--mode",
-    "0644",           "--verbatim-files-from",
+    "0644",           "--null",
     "--files-from",   "$tempdir/manifest"
   );
   if (exists $options{tar_format}) {
-- 
2.20.1

Reply via email to