A user of Debian noticed that tar (1.22) does not always preserve hard
links when creating an archive with the --remove-files option. Ted Ts'o
provided the following analysis:

On Sun, 13 Apr 2003 15:45:27 -0400, Theodore Ts'o <ty...@mit.edu> wrote:
> I'm pretty sure, by the way, that the problem is that tar is keying
> off of the st_nlink to decide whether or not to do hard link
> processing as an optimization.  When --remove-files is present, then
> st_nlink of the hard-linked inode is dropping, and when st_nlink is
> one, tar can't tell that it was previously a hard-linked file.  The
> fix would require that tar check every single file's inode number
> against previously written files to see if it was a hard linked file
> (instead of just checking files where st_nlink > 1), in the case when
> --remove-file option is in use.

I've attached two patches to fix this bug. The first implements Ted's
suggestion, (using the hard links hash table for all files when the
--remove-files option is in effect, regardless of the value of
st_nlink). The second patch adds a test case for the bug, (failing
before the first patch is added and passing afterwards).

Please let me know if you need anything else,

-Carl

PS. If you could preserve the CC list in any replies that would be
appreciated.
From f1ed85d46043c523cd5b8196c1d266f3606a2531 Mon Sep 17 00:00:00 2001
From: Carl Worth <cwo...@cworth.org>
Date: Wed, 29 Jul 2009 20:45:58 -0700
Subject: [PATCH 1/2] Preserve hard links with --remove-files

When the --remove-files option is in effect, it is no longer
reliable to use a file's link count to determine if we should
use the hash table for hard links. Instead, we look into the
hash table for every file when under the influence of the
--remove-files option.
---
 debian/changelog |    3 ++-
 src/create.c     |    4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/debian/changelog b/debian/changelog
index df3a125..747988e 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -3,8 +3,9 @@ tar (1.22-1.2) UNRELEASED; urgency=low
   * Add Carl Worth as an uploader.
   * Fix to allow parallel build (-j2), closes #535319
   * Don't close file stream before EOF, closes #525818
+  * Preserve hard links with --remove-files, closes #188663
 
- -- Carl Worth <cwo...@cworth.org>  Wed, 29 Jul 2009 16:18:18 -0700
+ -- Carl Worth <cwo...@cworth.org>  Wed, 29 Jul 2009 21:28:45 -0700
 
 tar (1.22-1.1) unstable; urgency=low
 
diff --git a/src/create.c b/src/create.c
index fde7ed1..559aaa0 100644
--- a/src/create.c
+++ b/src/create.c
@@ -1377,7 +1377,7 @@ static Hash_table *link_table;
 static bool
 dump_hard_link (struct tar_stat_info *st)
 {
-  if (link_table && st->stat.st_nlink > 1)
+  if (link_table && (st->stat.st_nlink > 1 || remove_files_option))
     {
       struct link lp;
       struct link *duplicate;
@@ -1424,7 +1424,7 @@ file_count_links (struct tar_stat_info *st)
 {
   if (hard_dereference_option)
     return;
-  if (st->stat.st_nlink > 1)
+  if (st->stat.st_nlink > 1 || remove_files_option)
     {
       struct link *duplicate;
       struct link *lp = xmalloc (offsetof (struct link, name)
-- 
1.6.3.3

From a75570c728ed2c3f65fb075491a07a9b4ade407f Mon Sep 17 00:00:00 2001
From: Carl Worth <cwo...@cworth.org>
Date: Wed, 29 Jul 2009 21:26:23 -0700
Subject: [PATCH 2/2] Add hardlinks test (to ensure they are preserved with --remove-files)

The new hardlinks.at test case verifies the fix in the previous
commit, (without that change the test fails, and with the change
the test passes).
---
 tests/hardlinks.at |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/testsuite.at |    2 ++
 2 files changed, 52 insertions(+), 0 deletions(-)
 create mode 100644 tests/hardlinks.at

diff --git a/tests/hardlinks.at b/tests/hardlinks.at
new file mode 100644
index 0000000..9e01ec3
--- /dev/null
+++ b/tests/hardlinks.at
@@ -0,0 +1,50 @@
+# Process this file with autom4te to create testsuite. -*- Autotest -*-
+
+# Test suite for GNU tar.
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301, USA.
+
+# Problem: hard links not preserved with --remove-files
+# Reported by: "Theodore Y. Ts'o" <ty...@mit.edu>
+# References: <e194eae-0001le...@think.thunk.org>
+# http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=188663
+
+AT_SETUP([preserve hard links with --remove-files])
+AT_KEYWORDS([hardlinks])
+
+AT_TAR_CHECK([
+genfile -l 64 -f file1
+link file1 file2
+link file2 file3
+link file3 file4
+tar cf archive --remove-files file1 file2 file3 file4
+tar xf archive
+rm archive
+genfile --stat=st_nlink file1
+genfile --stat=st_nlink file2
+genfile --stat=st_nlink file3
+genfile --stat=st_nlink file4
+],
+[0],
+[4
+4
+4
+4
+])
+
+AT_CLEANUP
+
diff --git a/tests/testsuite.at b/tests/testsuite.at
index a12477d..34325d7 100644
--- a/tests/testsuite.at
+++ b/tests/testsuite.at
@@ -140,6 +140,8 @@ m4_include([extrac07.at])
 
 m4_include([gzip.at])
 
+m4_include([hardlinks.at])
+
 m4_include([incremental.at])
 m4_include([incr01.at])
 m4_include([incr02.at])
-- 
1.6.3.3

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to