Re: [Python-Dev] cpython: Issue #5689: Add support for lzma compression to the tarfile module.
On Sun, Dec 11, 2011 at 11:45:06PM +0100, Antoine Pitrou wrote: > On Sat, 10 Dec 2011 20:40:17 +0100 > lars.gustaebel wrote: > > > > The :mod:`tarfile` module makes it possible to read and write tar > > -archives, including those using gzip or bz2 compression. > > +archives, including those using gzip, bz2 and lzma compression. > > (:file:`.zip` files can be read and written using the :mod:`zipfile` > > module.) > > Perhaps there should be a "versionchanged" directive for lzma support? This is now fixed. -- Lars Gustäbel l...@gustaebel.de There's no present. There's only the immediate future and the recent past. (George Carlin) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Request for developer privileges.
Hello, my name is Lars Gustäbel (SF gustaebel). I contributed tarfile.py to the Python standard library in January 2003 and have been the maintainer since then. I have provided about 25 patches over the years, most of them fixes, some of them new features and improvements. As a result, I am pretty familiar with the Python development process. If possible I would like to get developer privileges to be able to work more actively on tarfile.py for a certain time. I am currently implementing read-write POSIX.1-2001 pax format support. Development is still in progress, but it is already clear at this point, that it will be a complex change, which will definitely require some maintenance once it is finished and in day-to-day use. I would like to clean up the tarfile test suite during this process as well. The introduction of the pax format is important because it is the first tar specification that puts an end to those annoying limitations of the "original" tar format. It will become the default format for GNU tar some day. Thank you, Lars. -- Lars Gustäbel [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.5 branch unfrozen
On Sat, Apr 21, 2007 at 04:45:37PM +1000, Anthony Baxter wrote: > Ok, things seem to be OK. So the release25-maint branch is unfrozen. > Go crazy. Well, a little bit crazy. I'm afraid that I went crazy a little too early. Sorry for that. Won't happen again. -- Lars Gustäbel [EMAIL PROTECTED] The truth is rarely pure and never simple. (Oscar Wilde) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tarfile and directory traversal vulnerability
On Fri, Aug 24, 2007 at 07:36:41PM +0200, Jan Matejek wrote: > once upon a time there was a known vulnerability in tar (CVE-2001-1267, > [1]), and while tar is now long fixed, python's tarfile module is > affected too. > > The vulnerability goes basically like this: If you tar a file named > "../../../../../etc/passwd" and then make the admin untar it, > /etc/passwd gets overwritten. > Another variety of this bug is a symlink one: if tar contains files like: > ./-directory -> /etc > ./-directory/passwd > then the "-directory" symlink would be created first and /etc/passwd > will be overwritten once again. tarfile currently contains no sanity checks at all. The easiest way to attack /etc/passwd would be to give tarfile a tar created with `tar -cPf foo.tar /etc/passwd'. > I was wondering how to fix it. > The symlink problem obviously applies only to extractall() method and is > easily fixed by delaying external (or possibly all) symlink creation, > similar to how directory attributes are delayed now. > I've attached a draft of the patch, if you like it, i'll polish it. Suppose we have: foo -> /etc foo/passwd If creation of the foo symlink is delayed, foo/passwd will be extracted in a directory foo which will be created implicitly. If we create the foo symlink afterwards it will fail because foo already exists. The best way would be to completely ignore members and link targets that are absolute or outside the archive's scope. > The traversal problem is harder, and it applies to extract() method as well. > For extractall() alone, i would use something like: > > if tarinfo.name.startswith('../'): > self.extract(tarinfo, path) > else: > warnings.warn("non-local file skipped: %s" % tarinfo.name, > RuntimeWarning, stacklevel=1) > > For extract(), i am not sure. Maybe it should throw exception when it > encounters such file, and have a special option to extract such files > anyway. [...] Yes, I think that is the right way to do it. -- Lars Gustäbel [EMAIL PROTECTED] A chicken is an egg's way of producing more eggs. (Anonymous) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] tarfile and directory traversal vulnerability
On Mon, Aug 27, 2007 at 07:40:36PM +0200, Jan Matejek wrote: > Lars Gustäbel wrote: > > Suppose we have: > > foo -> /etc > > foo/passwd > > > > If creation of the foo symlink is delayed, foo/passwd will be > > extracted in a directory foo which will be created implicitly. > > If we create the foo symlink afterwards it will fail because foo > > already exists. The best way would be to completely ignore > > members and link targets that are absolute or outside the > > archive's scope. > > GNU tar doesn't descend into symlinked directories when extracting, such > archive fails anyway: > > # tar xvf foo.tar > foo > foo/passwd > tar: foo/passwd: Cannot open: Not a directory > tar: Error exit delayed from previous errors > > I think that is the simplest solution, but i'm not sure how to best > implement that in extractall(). GNU tar creates a placeholder file for every hard or symbolic link during the extract process and in a second step replaces them with links. I don't think that this is a good choice for a library. The problem is that it leads to delayed and (from the user's POV) unrelated errors. I prefer the solution that archive members with pathnames that either start with a "/" or a "../" raise an exception by default and can be extracted only by direct request. I am currently working on a patch. Should we move this discussion over to the bugtracker? -- Lars Gustäbel [EMAIL PROTECTED] Linux is like a wigwam - no Gates, no Windows, Apache inside. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-3000] Finishing up PEP 3108
Issue 2847 - the aifc module still imports the cl module in 3.0. Problem is that the cl module is gone. =) So it seems silly to have the imports lying about. This can probably be changed to critical. It shouldn't be a problem to rip everything cl-related out of aifc. The question is how useful aifc will be after that ... Has someone already used that module ? I took a look into it, but I'm a bit confused about the various compression types, case-sensitivity and compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the same encoding ? Can we use the audioop module for ALAW, just like it's already done for ULAW ? There is just one alaw I've ever come across (G.711), and the audioop implementation could be used (audioop's alaw support is younger than the aifc module, BTW) The capitalisation is confusing, but your document [1] says: "Apple Computer's QuickTime player recognize only the Apple compression types. Although "ALAW" and "ULAW" contain identical sound samples to the "alaw" and "ulaw" formats and were in use long before Apple introduced the new codes, QuickTime does not recognize them." So this seems just a matter of naming in the AIFC, but not a matter of two different alaw implementations. - Lars [1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Forking and pipes
Dear list, I recently noticed a python program which uses forks and pipes for communication between the processes not behaving as expected. The minimal example program: #!/usr/bin/python import os, sys r, w = os.pipe() write = os.fdopen(w, 'w') print >> write, "foo" pid = os.fork() if pid: os.waitpid(pid, 0) else: sys.exit(0) write.close() read = os.fdopen(r) print read.read() read.close() This prints out "foo" twice although it's only written once to the pipe. It seems that python doesn't flush file descriptors before copying them to the child process, thus resulting in the duplicate message. The equivalent C program behaves as expected, #include #include #include int main(void) { int fds[2]; pid_t pid; char* buf = (char*) calloc(4, sizeof(char)); pipe(fds); write(fds[1], "foo", 3); pid = fork(); if(pid) { waitpid(pid, NULL, 0); } else { return EXIT_SUCCESS; } close(fds[1]); read(fds[0], buf, 3); printf("%s\n", buf); close(fds[0]); free(buf); return EXIT_SUCCESS; } Is this behaviour intentional? I've tested both python and C on Linux, OpenBSD and Solaris (python versions 2.5.2 and 2.3.3), the behaviour was the same everywhere. Thanks, Lars ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] counterintuitive behavior (bug?) in Counter with +=
Hello, [First off, I'm not a member of this list, so please Cc: me in a reply!] I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows: count_total = Counter() for doc in documents: count_current = Counter(analyze(doc)) count_total += count_current count_per_doc.append(count_current) Because we target Python 2.5+, I implemented a lightweight replacement with just the functionality we need, including __iadd__, but then my co-developer ran the above code on Python 2.7 and performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__! I also figured out that I should use the update method instead, which I will, but I still find that uglier than +=. I would submit a patch to implement __iadd__, but I first want to know if that's considered the right behavior, since it changes the semantics of +=: >>> from collections import Counter >>> a = Counter([1,2,3]) >>> b = a >>> a += Counter([3,4,5]) >>> a is b False would become # snip >>> a is b True TIA, Lars [1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af [2] http://hg.python.org/cpython/file/tip/Lib/collections/__init__.py#l399 -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] counterintuitive behavior (bug?) in Counter with +=
2011/10/6 Petri Lehtinen : > Lars Buitinck wrote: >> >>> from collections import Counter >> >>> a = Counter([1,2,3]) >> >>> b = a >> >>> a += Counter([3,4,5]) >> >>> a is b >> False > > Sounds like a good idea to me. You should open an issue in the tracker > at http://bugs.python.org/. Done that: http://bugs.python.org/issue13121 -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com