Package: python2.3
Version: 2.3.5-4

I'm having trouble extracting files from a tar file using the tarfile
module in Python 2.3 (also with 2.4, as it happens). Below is a
commented session. The tarfile and foo.py are attached.

[EMAIL PROTECTED] ls -lARi
.:
total 12
10240090 -rw-rw-r--  1 liw liw  142 Jun 13 03:03 foo.py
10240006 drwxrwsr-x  2 liw liw 4096 Jun 13 03:07 sbin
10240008 -rw-rw-r--  1 liw liw  167 Jun 13 03:07 sbin.tar.gz

./sbin:
total 0
10240007 -rw-rw-r--  2 liw liw 0 Jun 13 03:06 fsck.ext2
10240007 -rw-rw-r--  2 liw liw 0 Jun 13 03:06 fsck.ext3

At this point. the tar file exists and the original directory from which
it was created likewise exists. Note the two harlinked files of zero
size.

[EMAIL PROTECTED] python foo.py
sbin/
sbin/fsck.ext2
sbin/fsck.ext3

The foo.py script uses the tarfile module to extract the tar file into
directory foo. At first glance, it seems to work, see listing below.
However, note that the inode number is the same for foo/sbin/fsck.ext3
and sbin/fsck.ext[23], but not the same as foo/sbin/fsck.ext2.

[EMAIL PROTECTED] ls -lARi
.:
total 16
10240002 drwxrwxrwx  3 liw liw 4096 Jun 13 03:12 foo
10240090 -rw-rw-r--  1 liw liw  142 Jun 13 03:03 foo.py
10240006 drwxrwsr-x  2 liw liw 4096 Jun 13 03:07 sbin
10240008 -rw-rw-r--  1 liw liw  167 Jun 13 03:07 sbin.tar.gz

./foo:
total 4
10240003 drwxrwsr-x  2 liw liw 4096 Jun 13 03:12 sbin

./foo/sbin:
total 0
10240004 -rw-rw-r--  1 liw liw 0 Jun 13 03:06 fsck.ext2
10240007 -rw-rw-r--  3 liw liw 0 Jun 13 03:06 fsck.ext3

./sbin:
total 0
10240007 -rw-rw-r--  3 liw liw 0 Jun 13 03:06 fsck.ext2
10240007 -rw-rw-r--  3 liw liw 0 Jun 13 03:06 fsck.ext3

Let's try to extract again, without the original sbin directory
existing.

[EMAIL PROTECTED] rm -rf foo sbin
[EMAIL PROTECTED] python foo.py
sbin/
sbin/fsck.ext2
sbin/fsck.ext3
[EMAIL PROTECTED] ls -lARi
.:
total 12
10240002 drwxrwxrwx  3 liw liw 4096 Jun 13 03:13 foo
10240090 -rw-rw-r--  1 liw liw  142 Jun 13 03:03 foo.py
10240008 -rw-rw-r--  1 liw liw  167 Jun 13 03:07 sbin.tar.gz

./foo:
total 4
10240003 drwxrwsr-x  2 liw liw 4096 Jun 13 03:13 sbin

./foo/sbin:
total 0
10240004 -rw-rw-r--  1 liw liw 0 Jun 13 03:06 fsck.ext2
[EMAIL PROTECTED]

Now foo/sbin/fsck.ext3 doesn't exist at all. My first guess would be
that tarfile uses the source name directly to do the hardlink, and not
the source name prepended with the extraction directory, as it should.

(GNU tar can unpack the tarfile correctly, so it's not the tarfile being
corrupted.)

Attachment: sbin.tar.gz
Description: application/compressed-tar

import tarfile

tar = tarfile.open("sbin.tar.gz", "r:gz")
for member in tar:
    print member.name
    tar.extract(member, "foo")
tar.close()

Reply via email to