[issue26740] tarfile: accessing (listing and extracting) tarball fails with UnicodeDecodeError

2016-04-12 Thread Tomas Tomecek

New submission from Tomas Tomecek:

I have a tarball (generated by docker-1.10 via `docker export`) and am trying 
to extract it with python 2.7 tarfile:

```
with tarfile.open(name=tarball_path) as tar_fd:
tar_fd.extractall(path=path)
```

Output from a pytest run:

```
/usr/lib64/python2.7/tarfile.py:2072: in extractall
for tarinfo in members:
/usr/lib64/python2.7/tarfile.py:2507: in next
tarinfo = self.tarfile.next()
/usr/lib64/python2.7/tarfile.py:2355: in next
tarinfo = self.tarinfo.fromtarfile(self)
/usr/lib64/python2.7/tarfile.py:1254: in fromtarfile
return obj._proc_member(tarfile)
/usr/lib64/python2.7/tarfile.py:1276: in _proc_member
return self._proc_pax(tarfile)
/usr/lib64/python2.7/tarfile.py:1406: in _proc_pax
value = value.decode("utf8")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

input = 
'\x01\x00\x00\x02\xc0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 errors = 'strict'

def decode(input, errors='strict'):
>   return codecs.utf_8_decode(input, errors, True)
E   UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 4: 
invalid start byte

/usr/lib64/python2.7/encodings/utf_8.py:16: UnicodeDecodeError
```

Since I know nothing about tars, I have no idea if this is a bug or there is a 
proper solution/workaround.

When using GNU tar, I'm able to to list and extract the tarball.

--
components: Unicode
messages: 263237
nosy: Tomas Tomecek, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: tarfile: accessing (listing and extracting) tarball fails with 
UnicodeDecodeError
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue26740>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26740] tarfile: accessing (listing and extracting) tarball fails with UnicodeDecodeError

2016-04-12 Thread Tomas Tomecek

Tomas Tomecek added the comment:

Unfortunately I can't, since it's internal docker image. I have found a bug 
report in Red Hat bugzilla with more info: 
https://bugzilla.redhat.com/show_bug.cgi?id=1194473 Here's even a commit with a 
fix (via monkeypatching): 
https://github.com/goldmann/docker-squash/commit/81d1c4c18960a5d940be9b986ccbfaa7853aceb1

If needed, I can construct a minimal reporoducer.

--

___
Python tracker 
<http://bugs.python.org/issue26740>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com