Hi, Dominique Martinet wrote on Thu, Jun 09, 2016 at 01:22:53PM +0200: > (For archive digging purpose, this looks a lot like > http://lists.gnu.org/archive/html/bug-tar/2010-11/msg00095.html ; except > that the file name must contain utf8/non-valid ASCII component) > > We've noticed the extracted path for some file is wrong IF both --sparse > and --xattrs is used AND the file is sparse and its path contains some > "weird" characters. > > Here's a full reproducer, ran it on today's git master branch: > > $ cd $(mktemp -d) > $ mkdir -p t > $ dd if=/dev/urandom of=t/barbarbar bs=1M seek=50 count=1 > $ cp t/barbarbar t/mumuµmu > $ tar --xattrs -S -c t | tar -t > t/ > t/barbarbar > t/GNUSparseFile.6221/mumuµmu > > I'm just listing here, but it would be extracted as such as well. > Looking at the binary tar, the problem is that the path is listed twice > for mumuµmu: > 30 GNU.sparse.name=t/mumuµmu > ... > 38 path=t/GNUSparseFile.6236/mumuµmu > > (while barbarbar only has GNU.sparse.name, and no path attribute) > > > For now I've just quick & dirty patched my own src/xheader.c path_decode > function to take the first path because it seems to work™ and we're in a > bit of a hurry; > another workaround as given in the mail I quoted at start would be to > use --sparse-version=0 > > > I guess the main fix should be to only output the header once though; > looking at the code (src/create.c, write_header_name), it seems that we > explicitely check !string_ascii_p (st->file_name) and write the extra > header then. > I'm not quite sure how to cleanly check that we already wrote the > filename in another attribute then... > > (Thinking back we might want to handle retro-compatibility and handle > archives made with existing tar versions over changing the way we code > output; so maybe always preferring GNU.sparse.name over path without > relying on order would be a better solution ?)
Does anyone have an opinion on this ? Would you take a patch if I went through the trouble of implementing either solution ? I don't really care on which solution to implement and both look possible to do (either not writing improper path in output tar or ignoring path if GNU.sparse.name is set on extracting); but I'd rather not pick one and be told "no we prefer the other one" after not getting any feedback... Or just being plain ignored. Thank you, -- Dominique Martinet