Hi, (For archive digging purpose, this looks a lot like http://lists.gnu.org/archive/html/bug-tar/2010-11/msg00095.html ; except that the file name must contain utf8/non-valid ASCII component)
We've noticed the extracted path for some file is wrong IF both --sparse and --xattrs is used AND the file is sparse and its path contains some "weird" characters. Here's a full reproducer, ran it on today's git master branch: $ cd $(mktemp -d) $ mkdir -p t $ dd if=/dev/urandom of=t/barbarbar bs=1M seek=50 count=1 $ cp t/barbarbar t/mumuµmu $ tar --xattrs -S -c t | tar -t t/ t/barbarbar t/GNUSparseFile.6221/mumuµmu I'm just listing here, but it would be extracted as such as well. Looking at the binary tar, the problem is that the path is listed twice for mumuµmu: 30 GNU.sparse.name=t/mumuµmu ... 38 path=t/GNUSparseFile.6236/mumuµmu (while barbarbar only has GNU.sparse.name, and no path attribute) For now I've just quick & dirty patched my own src/xheader.c path_decode function to take the first path because it seems to work™ and we're in a bit of a hurry; another workaround as given in the mail I quoted at start would be to use --sparse-version=0 I guess the main fix should be to only output the header once though; looking at the code (src/create.c, write_header_name), it seems that we explicitely check !string_ascii_p (st->file_name) and write the extra header then. I'm not quite sure how to cleanly check that we already wrote the filename in another attribute then... (Thinking back we might want to handle retro-compatibility and handle archives made with existing tar versions over changing the way we code output; so maybe always preferring GNU.sparse.name over path without relying on order would be a better solution ? Thanks, -- Dominique Martinet