[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

New submission from Srdjan Grubor:

When running tarfile.extract through multiple threads, the archive reading 
pointer is not protected from simultaneous seeks and causes various convoluted 
bugs:

  
self.archive_object.extract(member, extraction_path)
  File "/usr/lib/python3.4/tarfile.py", line 2019, in extract
set_attrs=set_attrs)
  File "/usr/lib/python3.4/tarfile.py", line 2088, in _extract_member
self.makefile(tarinfo, targetpath)
  File "/usr/lib/python3.4/tarfile.py", line 2127, in makefile
source.seek(tarinfo.offset_data)
  File "/usr/lib/python3.4/gzip.py", line 573, in seek
self.read(1024)
  File "/usr/lib/python3.4/gzip.py", line 365, in read
if not self._read(readsize):
  File "/usr/lib/python3.4/gzip.py", line 449, in _read
self._read_eof()
  File "/usr/lib/python3.4/gzip.py", line 485, in _read_eof
hex(self.crc)))
OSError: CRC check failed 0x1036a2e1 != 0x0

--
messages: 237960
nosy: sgnn7
priority: normal
severity: normal
status: open
title: tarfile not re-entrant for multi-threading
type: behavior
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Changes by Srdjan Grubor :


--
type: behavior -> enhancement

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

Also, extract_member in tarfile.py is not thread-safe since the check for 
folder existence might occur during another thread's creation of that same dir 
causing the code to error out.

  File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run
result = self.fn(*self.args, **self.kwargs)
  File "./xdelta3-dir-patcher", line 499, in _apply_file_delta
archive_object.expand(patch_file, staging_dir)
  File "./xdelta3-dir-patcher", line 284, in expand
self.archive_object.extract(member, extraction_path)
  File "/usr/lib/python3.4/tarfile.py", line 2019, in extract
set_attrs=set_attrs)
  File "/usr/lib/python3.4/tarfile.py", line 2080, in _extract_member
os.makedirs(upperdirs)
  File "/usr/lib/python3.4/os.py", line 237, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 
'/tmp/XDelta3DirPatcher_is0y4_5f/xdelta/updated folder'

Code causing problems:
2065 def _extract_member(self, tarinfo, targetpath, set_attrs=True):
...
2075 # Create all upper directories.
2076 upperdirs = os.path.dirname(targetpath)
2077 if upperdirs and not os.path.exists(upperdirs):
...
2080 os.makedirs(upperdirs)  # Fails since the dir might be already 
created between lines 2077 and 2080

--

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Changes by Srdjan Grubor :


--
type: enhancement -> behavior

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

The code around tarfile multi-threading was fixed for me on the user-side with 
threading.Lock() usage so it might work to use this within the library and the 
directory creation could be improved by probably doing a try/except around the 
makedirs() call with ignoring of the exception if it's FileExistsError - my 
code I use elsewhere fixes this with:
def _safe_makedirs(self, dir_path):
try:
makedirs(dir_path)
# Concurrency problems need to be handled. If two threads create
# the same dir, there might be a race between them checking and
# doing makedirs so we handle that as gracefully as possible here.
except FileExistsError as fee:
if not os.path.isdir(dir_path):
raise fee 

If I get time, I'll submit a patch but it seems like I probably won't for this.

--

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

I don't know if that's true of core libraries. Why complicate things for end 
users when those issues could be done in the library itself and be completely 
transparent to the devs? A simple RLock latch wouldn't pose almost any speed 
degradation but would work in both threaded and non-threaded situations as 
expected.

--
versions:  -Python 3.5

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

After some thinking, for the makedirs it should only need 
makedirs(exist_ok=True)

--

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

Patch for the multithreaded expansion of files and use of makedirs.

--
keywords: +patch
Added file: http://bugs.python.org/file38462/mutithreading_tarfile.patch

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

The whole lib still needs the threading locks added but the patch submitted 
should fix things for people that do the locking from their code.

--

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com