Re: [Python-Dev] cpython: Issue #5689: Add support for lzma compression to the tarfile module.

2011-12-12 Thread lars
On Sun, Dec 11, 2011 at 11:45:06PM +0100, Antoine Pitrou wrote:
> On Sat, 10 Dec 2011 20:40:17 +0100
> lars.gustaebel  wrote:
> >  
> >  The :mod:`tarfile` module makes it possible to read and write tar
> > -archives, including those using gzip or bz2 compression.
> > +archives, including those using gzip, bz2 and lzma compression.
> >  (:file:`.zip` files can be read and written using the :mod:`zipfile` 
> > module.)
> 
> Perhaps there should be a "versionchanged" directive for lzma support?

This is now fixed.

-- 
Lars Gustäbel
l...@gustaebel.de

There's no present. There's only the immediate future and
the recent past.
(George Carlin)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Request for developer privileges.

2006-12-20 Thread Lars Gustäbel
Hello,

my name is Lars Gustäbel (SF gustaebel). I contributed
tarfile.py to the Python standard library in January 2003 and
have been the maintainer since then. I have provided about 25
patches over the years, most of them fixes, some of them new
features and improvements. As a result, I am pretty familiar
with the Python development process.

If possible I would like to get developer privileges to be able
to work more actively on tarfile.py for a certain time.

I am currently implementing read-write POSIX.1-2001 pax format
support. Development is still in progress, but it is already
clear at this point, that it will be a complex change, which
will definitely require some maintenance once it is finished and
in day-to-day use. I would like to clean up the tarfile test
suite during this process as well. The introduction of the pax
format is important because it is the first tar specification
that puts an end to those annoying limitations of the "original"
tar format. It will become the default format for GNU tar some
day.

Thank you,
Lars.

-- 
Lars Gustäbel
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 branch unfrozen

2007-04-21 Thread Lars Gustäbel
On Sat, Apr 21, 2007 at 04:45:37PM +1000, Anthony Baxter wrote:
> Ok, things seem to be OK. So the release25-maint branch is unfrozen. 
> Go crazy. Well, a little bit crazy. 

I'm afraid that I went crazy a little too early. Sorry for that.
Won't happen again.

-- 
Lars Gustäbel
[EMAIL PROTECTED]

The truth is rarely pure and never simple.
(Oscar Wilde)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tarfile and directory traversal vulnerability

2007-08-25 Thread Lars Gustäbel
On Fri, Aug 24, 2007 at 07:36:41PM +0200, Jan Matejek wrote:
> once upon a time there was a known vulnerability in tar (CVE-2001-1267,
> [1]), and while tar is now long fixed, python's tarfile module is
> affected too.
> 
> The vulnerability goes basically like this: If you tar a file named
> "../../../../../etc/passwd" and then make the admin untar it,
> /etc/passwd gets overwritten.
> Another variety of this bug is a symlink one: if tar contains files like:
> ./-directory -> /etc
> ./-directory/passwd
> then the "-directory" symlink would be created first and /etc/passwd
> will be overwritten once again.

tarfile currently contains no sanity checks at all. The easiest
way to attack /etc/passwd would be to give tarfile a tar created
with `tar -cPf foo.tar /etc/passwd'.

> I was wondering how to fix it.
> The symlink problem obviously applies only to extractall() method and is
> easily fixed by delaying external (or possibly all) symlink creation,
> similar to how directory attributes are delayed now.
> I've attached a draft of the patch, if you like it, i'll polish it.

Suppose we have:
foo -> /etc
foo/passwd

If creation of the foo symlink is delayed, foo/passwd will be
extracted in a directory foo which will be created implicitly.
If we create the foo symlink afterwards it will fail because foo
already exists. The best way would be to completely ignore
members and link targets that are absolute or outside the
archive's scope.

> The traversal problem is harder, and it applies to extract() method as well.
> For extractall() alone, i would use something like:
> 
> if tarinfo.name.startswith('../'):
> self.extract(tarinfo, path)
> else:
> warnings.warn("non-local file skipped: %s" % tarinfo.name,
> RuntimeWarning, stacklevel=1)
> 
> For extract(), i am not sure. Maybe it should throw exception when it
> encounters such file, and have a special option to extract such files
> anyway. [...]

Yes, I think that is the right way to do it.

-- 
Lars Gustäbel
[EMAIL PROTECTED]

A chicken is an egg's way of producing more eggs.
(Anonymous)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] tarfile and directory traversal vulnerability

2007-08-27 Thread Lars Gustäbel
On Mon, Aug 27, 2007 at 07:40:36PM +0200, Jan Matejek wrote:
> Lars Gustäbel wrote:
> > Suppose we have:
> > foo -> /etc
> > foo/passwd
> > 
> > If creation of the foo symlink is delayed, foo/passwd will be
> > extracted in a directory foo which will be created implicitly.
> > If we create the foo symlink afterwards it will fail because foo
> > already exists. The best way would be to completely ignore
> > members and link targets that are absolute or outside the
> > archive's scope.
> 
> GNU tar doesn't descend into symlinked directories when extracting, such
> archive fails anyway:
> 
> # tar xvf foo.tar
> foo
> foo/passwd
> tar: foo/passwd: Cannot open: Not a directory
> tar: Error exit delayed from previous errors
> 
> I think that is the simplest solution, but i'm not sure how to best
> implement that in extractall().

GNU tar creates a placeholder file for every hard or symbolic
link during the extract process and in a second step replaces
them with links.
I don't think that this is a good choice for a library. The
problem is that it leads to delayed and (from the user's POV)
unrelated errors. I prefer the solution that archive members
with pathnames that either start with a "/" or a "../" raise an
exception by default and can be extracted only by direct
request.

I am currently working on a patch. Should we move this
discussion over to the bugtracker?

-- 
Lars Gustäbel
[EMAIL PROTECTED]

Linux is like a wigwam - no Gates, no Windows, Apache inside.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-3000] Finishing up PEP 3108

2008-05-29 Thread Lars Immisch



Issue 2847 - the aifc module still imports the cl module in 3.0.
Problem is that the cl module is gone. =) So it seems silly to have
the imports lying about. This can probably be changed to critical.


It shouldn't be a problem to rip everything cl-related out of aifc.
The question is how useful aifc will be after that ...


Has someone already used that module ? I took a look into it, but I'm a 
bit confused about the various compression types, case-sensitivity and 
compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the 
same encoding ? Can we use the audioop module for ALAW, just like it's 
already done for ULAW ?


There is just one alaw I've ever come across (G.711), and the audioop 
implementation could be used (audioop's alaw support is younger than the 
aifc module, BTW)


The capitalisation is confusing, but your document [1] says: "Apple 
Computer's QuickTime player recognize only the Apple compression types. 
Although "ALAW" and "ULAW" contain identical sound samples to the "alaw" 
and "ulaw" formats and were in use long before Apple introduced the new 
codes,  QuickTime does not recognize them."


So this seems just a matter of naming in the AIFC, but not a matter of 
two different alaw implementations.


- Lars

[1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Forking and pipes

2008-12-09 Thread Lars Kotthoff
Dear list,

 I recently noticed a python program which uses forks and pipes for
communication between the processes not behaving as expected. The minimal
example program:


#!/usr/bin/python

import os, sys

r, w = os.pipe()
write = os.fdopen(w, 'w')
print >> write, "foo"
pid = os.fork()
if pid:
os.waitpid(pid, 0)
else:
sys.exit(0)
write.close()
read = os.fdopen(r)
print read.read()
read.close()


This prints out "foo" twice although it's only written once to the pipe. It
seems that python doesn't flush file descriptors before copying them to the
child process, thus resulting in the duplicate message. The equivalent C
program behaves as expected,


#include 
#include 
#include 

int main(void) {
int fds[2];
pid_t pid;
char* buf = (char*) calloc(4, sizeof(char));

pipe(fds);
write(fds[1], "foo", 3);

pid = fork();
if(pid) {
waitpid(pid, NULL, 0);
} else {
return EXIT_SUCCESS;
}

close(fds[1]);

read(fds[0], buf, 3);
printf("%s\n", buf);
close(fds[0]);

free(buf);

return EXIT_SUCCESS;
}


Is this behaviour intentional? I've tested both python and C on Linux, OpenBSD
and Solaris (python versions 2.5.2 and 2.3.3), the behaviour was the same
everywhere.

Thanks,

Lars
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] counterintuitive behavior (bug?) in Counter with +=

2011-10-03 Thread Lars Buitinck
Hello,

[First off, I'm not a member of this list, so please Cc: me in a reply!]

I've found some counterintuitive behavior in collections.Counter while
hacking on the scikit-learn project [1]. I wanted to use a bunch of
Counters to do some simple term counting in a set of documents,
roughly as follows:

count_total = Counter()
for doc in documents:
count_current = Counter(analyze(doc))
count_total += count_current
count_per_doc.append(count_current)

Because we target Python 2.5+, I implemented a lightweight replacement
with just the functionality we need, including __iadd__, but then my
co-developer ran the above code on Python 2.7 and performance was
horrible. After some digging, I found out that Counter [2] does not
have __iadd__ and += copies the entire left-hand side in __add__!

I also figured out that I should use the update method instead, which
I will, but I still find that uglier than +=. I would submit a patch
to implement __iadd__, but I first want to know if that's considered
the right behavior, since it changes the semantics of +=:

>>> from collections import Counter
>>> a = Counter([1,2,3])
>>> b = a
>>> a += Counter([3,4,5])
>>> a is b
False

would become

# snip
>>> a is b
True

TIA,
Lars


[1] 
https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af
[2] http://hg.python.org/cpython/file/tip/Lib/collections/__init__.py#l399


-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] counterintuitive behavior (bug?) in Counter with +=

2011-10-07 Thread Lars Buitinck
2011/10/6 Petri Lehtinen :
> Lars Buitinck wrote:
>>     >>> from collections import Counter
>>     >>> a = Counter([1,2,3])
>>     >>> b = a
>>     >>> a += Counter([3,4,5])
>>     >>> a is b
>>     False
>
> Sounds like a good idea to me. You should open an issue in the tracker
> at http://bugs.python.org/.

Done that: http://bugs.python.org/issue13121

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com