[issue23649] tarfile not re-entrant for multi-threading
New submission from Srdjan Grubor: When running tarfile.extract through multiple threads, the archive reading pointer is not protected from simultaneous seeks and causes various convoluted bugs: self.archive_object.extract(member, extraction_path) File "/usr/lib/python3.4/tarfile.py", line 2019, in extract set_attrs=set_attrs) File "/usr/lib/python3.4/tarfile.py", line 2088, in _extract_member self.makefile(tarinfo, targetpath) File "/usr/lib/python3.4/tarfile.py", line 2127, in makefile source.seek(tarinfo.offset_data) File "/usr/lib/python3.4/gzip.py", line 573, in seek self.read(1024) File "/usr/lib/python3.4/gzip.py", line 365, in read if not self._read(readsize): File "/usr/lib/python3.4/gzip.py", line 449, in _read self._read_eof() File "/usr/lib/python3.4/gzip.py", line 485, in _read_eof hex(self.crc))) OSError: CRC check failed 0x1036a2e1 != 0x0 -- messages: 237960 nosy: sgnn7 priority: normal severity: normal status: open title: tarfile not re-entrant for multi-threading type: behavior versions: Python 3.4 ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Changes by Srdjan Grubor : -- type: behavior -> enhancement ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Srdjan Grubor added the comment: Also, extract_member in tarfile.py is not thread-safe since the check for folder existence might occur during another thread's creation of that same dir causing the code to error out. File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run result = self.fn(*self.args, **self.kwargs) File "./xdelta3-dir-patcher", line 499, in _apply_file_delta archive_object.expand(patch_file, staging_dir) File "./xdelta3-dir-patcher", line 284, in expand self.archive_object.extract(member, extraction_path) File "/usr/lib/python3.4/tarfile.py", line 2019, in extract set_attrs=set_attrs) File "/usr/lib/python3.4/tarfile.py", line 2080, in _extract_member os.makedirs(upperdirs) File "/usr/lib/python3.4/os.py", line 237, in makedirs mkdir(name, mode) FileExistsError: [Errno 17] File exists: '/tmp/XDelta3DirPatcher_is0y4_5f/xdelta/updated folder' Code causing problems: 2065 def _extract_member(self, tarinfo, targetpath, set_attrs=True): ... 2075 # Create all upper directories. 2076 upperdirs = os.path.dirname(targetpath) 2077 if upperdirs and not os.path.exists(upperdirs): ... 2080 os.makedirs(upperdirs) # Fails since the dir might be already created between lines 2077 and 2080 -- ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Changes by Srdjan Grubor : -- type: enhancement -> behavior ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Srdjan Grubor added the comment: The code around tarfile multi-threading was fixed for me on the user-side with threading.Lock() usage so it might work to use this within the library and the directory creation could be improved by probably doing a try/except around the makedirs() call with ignoring of the exception if it's FileExistsError - my code I use elsewhere fixes this with: def _safe_makedirs(self, dir_path): try: makedirs(dir_path) # Concurrency problems need to be handled. If two threads create # the same dir, there might be a race between them checking and # doing makedirs so we handle that as gracefully as possible here. except FileExistsError as fee: if not os.path.isdir(dir_path): raise fee If I get time, I'll submit a patch but it seems like I probably won't for this. -- ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Srdjan Grubor added the comment: I don't know if that's true of core libraries. Why complicate things for end users when those issues could be done in the library itself and be completely transparent to the devs? A simple RLock latch wouldn't pose almost any speed degradation but would work in both threaded and non-threaded situations as expected. -- versions: -Python 3.5 ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Srdjan Grubor added the comment: After some thinking, for the makedirs it should only need makedirs(exist_ok=True) -- ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Srdjan Grubor added the comment: Patch for the multithreaded expansion of files and use of makedirs. -- keywords: +patch Added file: http://bugs.python.org/file38462/mutithreading_tarfile.patch ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Srdjan Grubor added the comment: The whole lib still needs the threading locks added but the patch submitted should fix things for people that do the locking from their code. -- ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com