[Python-Dev] Information about how cpython in benchmarked

2011-03-29 Thread Tennessee Leeuwenburg
Hi all,

Apologies for emailing this list with such an apparently trivial question.
Is there some source of documentation or information on how Python is
benchmarked? I am aware of the Python regression testing module,
regrtest.py, which I presume, if profiled, would good be a good baseline
test.

PyPy maintains http://speed.pypy.org/, which provides very clear information
about the relative performance of PyPy trunk against some version of cpython
(presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, but
that could easily just be my ignorance speaking.

My interest is that I'm looking at building a benchmarking solution at work.
and I can't think of a better way to build something good and general than
to try and write something that could potentially be released as open source
and be useful to others. As such I thought that benchmarking cpython would
be a great use case, but I want to find out as much as I can about how
people currently go about benchmarking Python. Initially I'm just looking at
CPU profiling since it's easiest.

Anyway, if this is the wrong place to send this email, I'm very sorry for
clogging up your inbox.

Thanks very much,
-Tennessee
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Information about how cpython in benchmarked

2011-03-29 Thread Nick Coghlan
On Tue, Mar 29, 2011 at 8:01 PM, Tennessee Leeuwenburg
 wrote:
> PyPy maintains http://speed.pypy.org/, which provides very clear information
> about the relative performance of PyPy trunk against some version of cpython
> (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, but
> that could easily just be my ignorance speaking.
> My interest is that I'm looking at building a benchmarking solution at work.
> and I can't think of a better way to build something good and general than
> to try and write something that could potentially be released as open source
> and be useful to others. As such I thought that benchmarking cpython would
> be a great use case, but I want to find out as much as I can about how
> people currently go about benchmarking Python. Initially I'm just looking at
> CPU profiling since it's easiest.

One of the points coming out of the VM summit at Pycon is actually
that we want to create a shared benchmarking site for CPython, PyPy,
Jython, IronPython (and possibly Stackless) under the python.org
banner (either speed.python.org, or possibly performance.python.org,
since we want to do memory profiling as well).

speed.pypy.org will be the reference site for this, but Maciej
indicated at the VM summit that the code that runs that site needs
some improvements before it will really be up to the task of
effectively benchmarking multiple targets.

So, according to http://speed.pypy.org/about/, the place to start with
your benchmarking system would probably be
https://github.com/tobami/codespeed.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Information about how cpython in benchmarked

2011-03-29 Thread Jesse Noller
On Tue, Mar 29, 2011 at 7:00 AM, Nick Coghlan  wrote:
> On Tue, Mar 29, 2011 at 8:01 PM, Tennessee Leeuwenburg
>  wrote:
>> PyPy maintains http://speed.pypy.org/, which provides very clear information
>> about the relative performance of PyPy trunk against some version of cpython
>> (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, but
>> that could easily just be my ignorance speaking.
>> My interest is that I'm looking at building a benchmarking solution at work.
>> and I can't think of a better way to build something good and general than
>> to try and write something that could potentially be released as open source
>> and be useful to others. As such I thought that benchmarking cpython would
>> be a great use case, but I want to find out as much as I can about how
>> people currently go about benchmarking Python. Initially I'm just looking at
>> CPU profiling since it's easiest.
>
> One of the points coming out of the VM summit at Pycon is actually
> that we want to create a shared benchmarking site for CPython, PyPy,
> Jython, IronPython (and possibly Stackless) under the python.org
> banner (either speed.python.org, or possibly performance.python.org,
> since we want to do memory profiling as well).
>
> speed.pypy.org will be the reference site for this, but Maciej
> indicated at the VM summit that the code that runs that site needs
> some improvements before it will really be up to the task of
> effectively benchmarking multiple targets.
>
> So, according to http://speed.pypy.org/about/, the place to start with
> your benchmarking system would probably be
> https://github.com/tobami/codespeed.
>
> Cheers,
> Nick.

Essentially echoing what nick said. I'm currently working on getting
the HW for this together.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] utf-8 encoding in checkins?

2011-03-29 Thread skip

>> I guess I have my work cut out for me.  It appears my preferred mail
>> reader, VM, is not supported out-of-the-box by GNU Emacs (they still
>> use Rmail and Babyl for some reason), and I'm not sure the investment
>> trying to get XEmacs built with MULE is worth the effort.

Anders> Use a 21.5 beta of XEmacs instead of 21.4, 21.5 deals with utf-8
Anders> quite well.

Thanks for the various responses, both public and private.  In part because
Barry made the leap back from XEmacs to GNU Emacs (and I trust Barry in all
things Emacs), I decided to dip my toe back into the GNU water.  I needed to
install a recent version of VM, but it does do utf-8, so my original problem
is solved.  In response to Anders, I had tried 21.5b28 awhile ago but backed
off from it.  I no longer recall why.

My only issues now are:

 * make sure the ediff and vc packages recognize version-controlled files
   (It seems they do, but I haven't put them through their paces)

 * replace the GNU python.el with python-mode.el from the python-mode
   project (formerly distributed with Python, but now all grown up and moved
   away).

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] utf-8 encoding in checkins?

2011-03-29 Thread Anders J. Munch

s...@pobox.com wrote:

I guess I have my work cut out for me.  It appears my preferred mail reader,
VM, is not supported out-of-the-box by GNU Emacs (they still use Rmail and
Babyl for some reason), and I'm not sure the investment trying to get XEmacs
built with MULE is worth the effort.


Use a 21.5 beta of XEmacs instead of 21.4, 21.5 deals with utf-8 quite well.

- Anders

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Proposed change to logging.basicConfig

2011-03-29 Thread Vinay Sajip
I'm planning a change to logging.basicConfig to add an optional "handlers"
keyword argument which defaults to None.

If specified, this should be an iterable of already created handlers, which will
be added to the root logger (if it doesn't already have any handlers). Any
handler in the iterable which does not have a formatter assigned will be
assigned the formatter created by basicConfig.

If "handlers" is specified, the "stream", "filename" and "filemode" arguments
will be ignored.

If any of you can see any problems with this change, or can suggest any
improvement to the approach, please respond. I expect to check this change in
within the next few days.

Regards,

Vinay Sajip


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed change to logging.basicConfig

2011-03-29 Thread Antoine Pitrou
On Tue, 29 Mar 2011 16:35:08 + (UTC)
Vinay Sajip  wrote:
> I'm planning a change to logging.basicConfig to add an optional "handlers"
> keyword argument which defaults to None.
> 
> If specified, this should be an iterable of already created handlers, which 
> will
> be added to the root logger (if it doesn't already have any handlers). Any
> handler in the iterable which does not have a formatter assigned will be
> assigned the formatter created by basicConfig.
> 
> If "handlers" is specified, the "stream", "filename" and "filemode" arguments
> will be ignored.
> 
> If any of you can see any problems with this change, or can suggest any
> improvement to the approach, please respond.

I'm not a logging expert, but the fact that your description above
mentions at least two instances of special-casing make it sound like
the API has an usability (or learnability) problem.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Security implications of pep 383

2011-03-29 Thread Michael Foord

Hey all,

Not sure how real the security risk is here:

http://blog.omega-prime.co.uk/?p=107

Basically  he is saying that if you store a list of blacklisted files 
with names encoded in big-5 (or some other non-utf8 compatible encoding) 
if those names are passed at the command line, or otherwise read in and 
decoded from an assumed-utf8 source with surrogate escaping, the 
surrogate escape decoded names will not match the properly decoded 
blacklisted names.


All the best,

Michael Foord

--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed change to logging.basicConfig

2011-03-29 Thread Terry Reedy

On 3/29/2011 12:35 PM, Vinay Sajip wrote:

I'm planning a change to logging.basicConfig to add an optional "handlers"
keyword argument which defaults to None.

If specified, this should be an iterable of already created handlers, which will
be added to the root logger (if it doesn't already have any handlers). Any
handler in the iterable which does not have a formatter assigned will be
assigned the formatter created by basicConfig.

If "handlers" is specified, the "stream", "filename" and "filemode" arguments
will be ignored.

If any of you can see any problems with this change, or can suggest any
improvement to the approach, please respond. I expect to check this change in
within the next few days.


I am bothered by mutually exclusive parameters. This is one reason I was 
glad to see cmp eliminated from list.sort. Quick: what happens if one 
passes both cmp and key to list.sort? There are three reasonable 
possibilities. As far as I can read, the answer is not documented.#


I am not familiar with logging, but I wonder if you should have two 
functions for the two quite different signatures. If not, I think the 
result of passing conflicting parameters should be something like 
TypeError: conflicting parameters passed. "In the face of ambiguity, 
refuse to guess."


# Experiment with 2.7 shows that cmp wins. Though too late to change, I 
consider this the worst choice of three. I think an exception should be 
raised. Failing that, I think key should win on the basis that if one 
adds a 'new-fangled' key func to an existing call with cmp (and forgets 
to remove cmp), the key func is the one intended. Also, the doc clearly 
indicates that key is considered superior to cmp.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Antoine Pitrou
On Tue, 29 Mar 2011 19:23:25 +0100
Michael Foord  wrote:
> Hey all,
> 
> Not sure how real the security risk is here:
> 
>  http://blog.omega-prime.co.uk/?p=107
> 
> Basically  he is saying that if you store a list of blacklisted files 
> with names encoded in big-5 (or some other non-utf8 compatible encoding) 
> if those names are passed at the command line, or otherwise read in and 
> decoded from an assumed-utf8 source with surrogate escaping, the 
> surrogate escape decoded names will not match the properly decoded 
> blacklisted names.

This has nothing to do specifically with PEP 383. The same issues can
arise without PEP 383 if you replace utf-8 with, say, latin-1 in the
above example.

Basically, what this says is if you are decoding the same bytestring
using two different encodings, you get two different unicode strings
(which therefore compare unequal).

Another observation is that, in the script which is presented, if the
user were to extract a filename from the blacklist and call open() on
it, they wouldn't actually open one of the blacklisted files, since the
encoded representation using the filesystem encoding (e.g. utf-8 or
latin-1) would be different from the Big-5 representation.

A solution would be to open the blacklist file in binary mode and call
os.fsdecode() on the result.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Martin v. Löwis
> Not sure how real the security risk is here:
> 
> http://blog.omega-prime.co.uk/?p=107
> 
> Basically  he is saying that if you store a list of blacklisted files
> with names encoded in big-5 (or some other non-utf8 compatible encoding)
> if those names are passed at the command line, or otherwise read in and
> decoded from an assumed-utf8 source with surrogate escaping, the
> surrogate escape decoded names will not match the properly decoded
> blacklisted names.

As described, I find the problem a little bit artificial: supposedly,
he was passing the file name on the command line. However, since his
terminal is in UTF-8 and the file name in Big5, the console didn't
display the file name in a meaningful way when he ran the program. So
whoever ran the program ignored the moji-bake, and didn't wonder whether
it could have any effect on proper functioning of the program. In
addition, if he did ls(1) on the directory, it would have displayed
question marks throughout. This should alert the user that something bad
is going on.

Notice that this isn't really PEP-383's fault. If the file system
encoding was UTF-8, and the blacklist was UTF-8, and the program
ran in a Latin-1 locale, it would have decoded the file name nicely
(without surrogates), but the blacklist check would still have failed.

He should have opened the file in the locale's encoding (i.e. giving no
encoding), using the surrogate escape handler.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython (2.6): Issue #11639: Configuration function documentation referred to logging.XXX

2011-03-29 Thread Éric Araujo
Le 29/03/2011 02:16, vinay.sajip a écrit :
> http://hg.python.org/cpython/rev/bfa2a8d91859
> changeset:   69034:bfa2a8d91859
> branch:  2.6
> parent:  68802:b99c94261225
> user:Vinay Sajip 
> date:Tue Mar 29 01:07:50 2011 +0100
> summary:
>   Issue #11639: Configuration function documentation referred to logging.XXX 
> rather than logging.config.XXX.

Only security fixes should go into 2.5 and 2.6.  Could you revert (hg
backout) this changeset?

Regards
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Laura Creighton
In a message of Tue, 29 Mar 2011 19:23:25 BST, Michael Foord writes:
>Hey all,
>
>Not sure how real the security risk is here:
>
> http://blog.omega-prime.co.uk/?p=107
>
>Basically  he is saying that if you store a list of blacklisted files 
>with names encoded in big-5 (or some other non-utf8 compatible encoding) 
>if those names are passed at the command line, or otherwise read in and 
>decoded from an assumed-utf8 source with surrogate escaping, the 
>surrogate escape decoded names will not match the properly decoded 
>blacklisted names.

>All the best,
>
>Michael Foord
>

I am not sure there are any security related gotchas here.  All he is
saying is that if you decode the same bytestring using two different
encodings, you will get two different unicode strings (which therefore
will compare unequal).  Where's the problem, except in that you might
have unrealistic expectations?

Laura
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Toshio Kuratomi
On Tue, Mar 29, 2011 at 07:23:25PM +0100, Michael Foord wrote:
> Hey all,
> 
> Not sure how real the security risk is here:
> 
> http://blog.omega-prime.co.uk/?p=107
> 
> Basically  he is saying that if you store a list of blacklisted files
> with names encoded in big-5 (or some other non-utf8 compatible
> encoding) if those names are passed at the command line, or otherwise
> read in and decoded from an assumed-utf8 source with surrogate
> escaping, the surrogate escape decoded names will not match the
> properly decoded blacklisted names.
> 
The example is correct.  The security risk is real.  However, there's a flaw
in the program and whether the question of whether there's also a flaw in
python is not so certain.

Here's the line I'd say is contentious::
  blacklist = open("blacklist.big5", encoding='big5').read().split()

The blacklist file contains a list of filenames.  However, this code treats
it as a list of strings.  This a logic error in the program, and he should
really be doing this::
  blacklist = open("blacklist.big5", 'rb').read().split()

Then, when comparing it against the values of sys.argv, either sys.argv gets
converted into bytes (using the system locale since that's what was used to
encode to unicode) or the items in blacklist get converted to unicode with
surrogateescape.

The possible flaw in python is this:  Code like the blog poster wrote passes
python3 without an error or a warning.  This gives the programmer no
feedback that they're doing something wrong until it actually bites them in
the foot in deployed code.

-Toshio


pgpZiD1gfinFR.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed change to logging.basicConfig

2011-03-29 Thread Matthew Woodcraft
Terry Reedy   wrote:
> I am bothered by mutually exclusive parameters. This is one reason I was
> glad to see cmp eliminated from list.sort. Quick: what happens if one
> passes both cmp and key to list.sort? There are three reasonable
> possibilities. As far as I can read, the answer is not documented.#

> # Experiment with 2.7 shows that cmp wins. Though too late to change, I
> consider this the worst choice of three. I think an exception should be
> raised. Failing that, I think key should win on the basis that if one
> adds a 'new-fangled' key func to an existing call with cmp (and forgets
> to remove cmp), the key func is the one intended. Also, the doc clearly
> indicates that key is considered superior to cmp.

Neither 'wins': cmp is applied to the output of key.

I agree that it would have been worth documenting this explicitly.

-M-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Failed issue tracker submission

2011-03-29 Thread Python tracker


The node specified by the designator in the subject of your message
("22663") does not exist.

Subject was: "[issue22663]"



Mail Gateway Help
=
Incoming messages are examined for multiple parts:
 . In a multipart/mixed message or part, each subpart is extracted and
   examined. The text/plain subparts are assembled to form the textual
   body of the message, to be stored in the file associated with a "msg"
   class node. Any parts of other types are each stored in separate files
   and given "file" class nodes that are linked to the "msg" node.
 . In a multipart/alternative message or part, we look for a text/plain
   subpart and ignore the other parts.

Summary
---
The "summary" property on message nodes is taken from the first non-quoting
section in the message body. The message body is divided into sections by
blank lines. Sections where the second and all subsequent lines begin with
a ">" or "|" character are considered "quoting sections". The first line of
the first non-quoting section becomes the summary of the message.

Addresses
-
All of the addresses in the To: and Cc: headers of the incoming message are
looked up among the user nodes, and the corresponding users are placed in
the "recipients" property on the new "msg" node. The address in the From:
header similarly determines the "author" property of the new "msg"
node. The default handling for addresses that don't have corresponding
users is to create new users with no passwords and a username equal to the
address. (The web interface does not permit logins for users with no
passwords.) If we prefer to reject mail from outside sources, we can simply
register an auditor on the "user" class that prevents the creation of user
nodes with no passwords.

Actions
---
The subject line of the incoming message is examined to determine whether
the message is an attempt to create a new item or to discuss an existing
item. A designator enclosed in square brackets is sought as the first thing
on the subject line (after skipping any "Fwd:" or "Re:" prefixes).

If an item designator (class name and id number) is found there, the newly
created "msg" node is added to the "messages" property for that item, and
any new "file" nodes are added to the "files" property for the item.

If just an item class name is found there, we attempt to create a new item
of that class with its "messages" property initialized to contain the new
"msg" node and its "files" property initialized to contain any new "file"
nodes.

Triggers

Both cases may trigger detectors (in the first case we are calling the
set() method to add the message to the item's spool; in the second case we
are calling the create() method to create a new node). If an auditor raises
an exception, the original message is bounced back to the sender with the
explanatory message given in the exception.

$Id: mailgw.py,v 1.196 2008-07-23 03:04:44 richard Exp $
Return-Path: 
X-Original-To: rep...@bugs.python.org
Delivered-To: roundup+trac...@psf.upfronthosting.co.za
Received: from mail.python.org (mail.python.org [82.94.164.166])
by psf.upfronthosting.co.za (Postfix) with ESMTPS id 7DCEE1DEB0
for ; Tue, 29 Mar 2011 22:10:55 +0200 (CEST)
Received: from albatross.python.org (localhost [127.0.0.1])
by mail.python.org (Postfix) with ESMTP id 3PzjsW1q1lz7Lmy
for ; Tue, 29 Mar 2011 22:10:55 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=python.org; s=200901;
t=1301429455; bh=WYL3NF6gQtbDZ+R9KxXHGS2PSlCAxyY+EQEgb/Yw5jI=;
h=Date:Message-Id:Content-Type:MIME-Version:
 Content-Transfer-Encoding:From:To:Subject;
b=RiMAivS4Shae7bPg7E7SocheqYB9pzk/Svimv+qumX5arnUaaC8h9iIJo8MFDcDdi
 +Wk0XzTjTjKsbobrKgZnfZf9a8j6Fv4Ym1nTyTcPcyjCMritjq9xNUluVQvHv/Vn2e
 RhpV2FUWOdCtBx83eUopMPGEEEwABnbG5ZwgsDzM=
Received: from localhost (HELO mail.python.org) (127.0.0.1)
  by albatross.python.org with SMTP; 29 Mar 2011 22:10:55 +0200
Received: from dinsdale.python.org (svn.python.org [IPv6:2001:888:2000:d::a4])
(using TLSv1 with cipher AES256-SHA (256/256 bits))
(No client certificate requested)
by mail.python.org (Postfix) with ESMTPS
for ; Tue, 29 Mar 2011 22:10:55 +0200 (CEST)
Received: from localhost
([127.0.0.1] helo=dinsdale.python.org ident=hg)
by dinsdale.python.org with esmtp (Exim 4.72)
(envelope-from )
id 1Q4fFf-00023G-4C
for rep...@bugs.python.org; Tue, 29 Mar 2011 22:10:55 +0200
Date: Tue, 29 Mar 2011 22:10:55 +0200
Message-Id: 
Content-Type: text/plain; charset="utf8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
From: python-dev@python.org
To: rep...@bugs.python.org
Subject: [issue22663]

TmV3IGNoYW5nZXNldCBkZDg1MmEwZjkyZDYgYnkgZ3VpZG8gaW4gYnJhbmNoICcyLjUnOgpJc3N1
ZSAyMjY2MzogZml4IHJlZGlyZWN0IHZ1bG5lcmFiaWxpdHkgaW4gdXJsbGliL3VybGxpYjIuCmh0
dHA6Ly9oZy5weXRob24ub3JnL2NweXRob24vcmV2L2RkODUyYTBmOTJkNgo=
___

Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Victor Stinner
Le mardi 29 mars 2011 à 19:23 +0100, Michael Foord a écrit :
> Hey all,
> 
> Not sure how real the security risk is here:
> 
>  http://blog.omega-prime.co.uk/?p=107
> 
> Basically  he is saying that if you store a list of blacklisted files 
> with names encoded in big-5 (or some other non-utf8 compatible encoding) 
> if those names are passed at the command line, or otherwise read in and 
> decoded from an assumed-utf8 source with surrogate escaping, the 
> surrogate escape decoded names will not match the properly decoded 
> blacklisted names.

Yes, if you decode two byte strings from two different encodings, you
get different unicode strings. It's not related to surrogateescape (PEP
383).

Sorry, '\u4f60\u597d'.encode('big5').decode('latin1') doesn't give you
'\u4f60\u597d' but '§A¦n', and it doesn't warn you that latin1 is not
big5 (there is no UnicodeEncodeError, even if the error handler is
strict).

I think that the example has two issues:

 - security using blacklists doesn't work (it is better to use 
   a whitelist)
 - if filenames are stored as Big5, they must be decoded from Big5,
   and so the locale encoding must be Big5

I don't understand the last paragraph:

"P.P.S I will further note that you get the same issue even if the
blacklist and filename had been in UTF-8, but this time it gets broken
from a terminal in the Big5 locale. I didn’t show it this way around
because I understand that Python 3 may only have just recently started
using the locale to decode argv, rather than being hardcoded to UTF-8."

Python filesystem encoding is only hardcoded to UTF-8 on Mac OS X, on
other operating systems, it is the locale encoding.

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Lennart Regebro
On Tue, Mar 29, 2011 at 22:40, Lennart Regebro  wrote:
> The lesson here seems to be "if you have to use blacklists, and you
> use unicode strings for those blacklists, also make sure the string
> you compare with doesn't have surrogates".
>

For that matter, what happens with combining characters?

'\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL
LETTER O WITH DIAERESIS}'

I guess the filesystem shouldn't treat these as the same (even though
they are), but what if some webservice does? I suspect you should
normalize both strings before comparing them in any blacklist, and
what happens with surrogates when you normalize?

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Lennart Regebro
The lesson here seems to be "if you have to use blacklists, and you
use unicode strings for those blacklists, also make sure the string
you compare with doesn't have surrogates".

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Victor Stinner
Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit :
> The lesson here seems to be "if you have to use blacklists, and you
> use unicode strings for those blacklists, also make sure the string
> you compare with doesn't have surrogates".

No. '\u4f60\u597d'.encode('big5').decode('latin1') gives '§A¦n' which
doesn't contain any surrogate character.

The lesson is: if you compare Unicode filenames on UNIX, make sure that
your system is correctly configured (the locale encoding must be the
filesystem encoding).

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Antoine Pitrou
On Tue, 29 Mar 2011 22:40:01 +0200
Lennart Regebro  wrote:
> The lesson here seems to be "if you have to use blacklists, and you
> use unicode strings for those blacklists, also make sure the string
> you compare with doesn't have surrogates".

Not really. As everyone said, this can happen even without surrogates.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Victor Stinner
Le mardi 29 mars 2011 à 22:45 +0200, Lennart Regebro a écrit :
> On Tue, Mar 29, 2011 at 22:40, Lennart Regebro  wrote:
> > The lesson here seems to be "if you have to use blacklists, and you
> > use unicode strings for those blacklists, also make sure the string
> > you compare with doesn't have surrogates".
> >
> 
> For that matter, what happens with combining characters?
> 
> '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL
> LETTER O WITH DIAERESIS}'
> 
> I guess the filesystem shouldn't treat these as the same (even though
> they are), but what if some webservice does?

Mac OS X does normalize filenames to a variant of the D (decomposed)
form.
http://www.haypocalc.com/tmp/unicode-2011-03-25/html/operating_systems.html#mac-os-x

> I suspect you should normalize both strings before comparing them in any 
> blacklist,

Yes, but a blacklist is not safe: use a whitelist.

> and what happens with surrogates when you normalize?

Surrogates are not the same in forms N, D, KC and KD.

>>> unicodedata.normalize('NFC', '\uDC80') ==
unicodedata.normalize('NFC', '\uDC80') == unicodedata.normalize('NFKC',
'\uDC80') == unicodedata.normalize('NFKD', '\uDC80') == '\uDC80'
True

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed change to logging.basicConfig

2011-03-29 Thread Vinay Sajip
Antoine Pitrou  pitrou.net> writes:

> I'm not a logging expert, but the fact that your description above
> mentions at least two instances of special-casing make it sound like
> the API has an usability (or learnability) problem.

Well, basicConfig() was provided to make it as easy as possible to configure
logging for the common use cases - logging to console and logging to file. To do
this, keyword parameters are passed: stream= (defaulting to sys.stderr, for
logging to console) or filename= and filemode= which specify a file to log to.
These sets are not compatible. Of course I could have provided two different
functions with different signatures, but it seemed simpler to have a single
function, which is typically used in just one place in an application script.
This is unlike the cmp/key case Terry mentions, and in any case the effect of
passing incompatible parameters is documented in the case of basicConfig().

If by special casing you're referring to "(if it doesn't already have any
handlers)", that's existing behaviour, and not a change. If you're referring to
"the arguments will be ignored" sentence - Terry makes a valid point about
raising an exception if incompatible parameters are passed, and I will do this.
If you're referring to the sentence about formatters, I don't see it as a
special case, it's about convenience. Each logging handler can have a formatter,
and the proposed API allows both bespoke formatters to be set for individual
handlers and for a common formatter to be set for multiple handlers, with ISTM
minimal effort for the API user.

If you're referring to something else entirely, I'm not sure what that might be.

Regards,

Vinay Sajip


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Martin v. Löwis
> '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL
> LETTER O WITH DIAERESIS}'
> 
> I guess the filesystem shouldn't treat these as the same (even though
> they are), but what if some webservice does? I suspect you should
> normalize both strings before comparing them in any blacklist, and
> what happens with surrogates when you normalize?

I think the whole blacklist example is artificial. The string in the
blacklist is actually a Chinese "hello" greeting, so it surely isn't
the string being blacklisted. For proper blacklisting, you would likely
use substring searches, case-insensitivity, transliterations, and
perhaps even regular expressions and word stemming. If you consider all
these things, proper or alternative encodings of the same text are just
another issue to consider.

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Differences among Emacsen (was: utf-8 encoding in checkins?)

2011-03-29 Thread Ben Finney
s...@pobox.com writes:

> My only issues now are:
>
>  * make sure the ediff and vc packages recognize version-controlled files
>(It seems they do, but I haven't put them through their paces)

The ‘vc’ package (I'm using Debian's GNU Emacs 23.2.1) now recognises
DVCS-controlled *files*, and works well with them. It's still unaware
that modern VCS deals with project *trees*, so works only at an
individual file level. Still quite useful (e.g. ‘vc-diff’ and the like).

The ‘dvc’ project http://download.gna.org/dvc/> shows promise, but
isn't yet in Debian so I haven't tried it.

I just use ‘ediff’ between different working trees on the filesystem, so
I don't know how well it works with files that don't exist.

>  * replace the GNU python.el with python-mode.el from the python-mode
>project (formerly distributed with Python, but now all grown up and moved
>away).

What's the current thinking on that? The native GNU Emacs Python mode
seems fine to me, but I'm not a particularly clever Emacs user so am
probably missing a whole lot.

-- 
 \“Science doesn't work by vote and it doesn't work by |
  `\authority.” —Richard Dawkins, _Big Mistake_ (The Guardian, |
_o__)  2006-12-27) |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Differences among Emacsen (was: utf-8 encoding in checkins?)

2011-03-29 Thread Barry Warsaw
On Mar 30, 2011, at 09:20 AM, Ben Finney wrote:

>The ‘vc’ package (I'm using Debian's GNU Emacs 23.2.1) now recognises
>DVCS-controlled *files*, and works well with them. It's still unaware
>that modern VCS deals with project *trees*, so works only at an
>individual file level. Still quite useful (e.g. ‘vc-diff’ and the like).
>
>The ‘dvc’ project http://download.gna.org/dvc/> shows promise, but
>isn't yet in Debian so I haven't tried it.

That'll be interesting to track.  I do something probably similar to you: I
diff inside Emacs but always $vcs commit from a shell.  I vaguely remember
some ancient way of doing tree-based commits using svn in Emacs, but those
brain cells have long been recycled.

>I just use ‘ediff’ between different working trees on the filesystem, so
>I don't know how well it works with files that don't exist.

One thing I miss so far with Mercurial is 'smerge' mode.  When I have a
conflict in a Bazaar branch, I can just visit the conflicting file and Emacs
dumps me in a minor mode called smerge.  This gives me key bindings to hop
around between conflict areas, select which part I want (or both) and then
automatically calls $vcs resolve when all the conflict areas are taken care
of.  Makes it *very* nice for dealing with such things, but in the one case I
tried it with Mercurial conflicts, it didn't work.  I'll have to investigate
further, but I'm guessing it's caused by some incompatibility with hg's
conflict markers.

>>  * replace the GNU python.el with python-mode.el from the python-mode
>>project (formerly distributed with Python, but now all grown up and moved
>>away).
>
>What's the current thinking on that? The native GNU Emacs Python mode
>seems fine to me, but I'm not a particularly clever Emacs user so am
>probably missing a whole lot.

In case you missed it, there are now *three* Python modes.  Tim Peters'
original and best (in my completely unbiased opinion ) python-mode.el
which is still being developed, the older but apparently removed from Emacs
python.el and the 'new' (so I've heard) python.el.

Since Skip and I work on python-mode.el, you can tell what our preference is.
The fact that it hasn't been pulled into Emacs is a long and dark political
tale full of intrigue, subterfuge, fast cars, Matt Damon, sharks with frickin'
laser beams attached to their heads, and downright redonkulousness.  If you
want the full gory details (or just want to help make the most awesome Python
editing mode even awesomer), come join us on python-m...@python.org.

-Barry

P.S. pdbtrack


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] cmp= & key= (Re: Proposed change to logging.basicConfig)

2011-03-29 Thread Terry Reedy

On 3/29/2011 4:02 PM, Matthew Woodcraft wrote:

Terry Reedy  wrote:

# Experiment with 2.7 shows that cmp wins. Though too late to change, I
consider this the worst choice of three. I think an exception should be
raised. Failing that, I think key should win on the basis that if one
adds a 'new-fangled' key func to an existing call with cmp (and forgets
to remove cmp), the key func is the one intended. Also, the doc clearly
indicates that key is considered superior to cmp.


Neither 'wins': cmp is applied to the output of key.


Added to http://bugs.python.org/issue11712 (for 2.7 only ;-)


I agree that it would have been worth documenting this explicitly.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Terry Reedy

On 3/29/2011 2:23 PM, Michael Foord wrote:


Not sure how real the security risk is here:

http://blog.omega-prime.co.uk/?p=107

Basically he is saying that if you store a list of blacklisted files
with names encoded in big-5 (or some other non-utf8 compatible encoding)
if those names are passed at the command line, or otherwise read in and
decoded from an assumed-utf8 source with surrogate escaping, the
surrogate escape decoded names will not match the properly decoded
blacklisted names.


I posted link to this as comment, with my summary of thread.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] .hgignore including site-packages and scripts directories?

2011-03-29 Thread Mark Hammond
I'm wondering if it is a reasonable idea to have .hgignore exclude all 
files from 'Lib/site-packages' and 'Scripts'?  As I install packages 
into my source builds, a 'hg status' lists *many* files in both those 
directories forcing me to scroll up a number of pages to see files which 
have actually changed.


IIUC, listing a directory in .hgignore doesn't preclude files from that 
directory being added to hg, and doesn't prevent files in those 
directories already under hg from being detected as changed.  The only 
downside I can see if that if new files are added to those directories 
which should be added to hg, a simple "hg st" will not show it - someone 
must remember and explicitly add it.  However, ISTM those files are 
already likely to be missed given the large amount of noise 'hg st' 
shows in that directory - the files are likely to be in the middle of a 
very long list which my brain will be trained to habitually skip over. 
The number of new files which legitimately need to be added to those 
directories seem so small that this risk seems worthwhile.


Any thoughts?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Information about how cpython in benchmarked

2011-03-29 Thread Tennessee Leeuwenburg
Hi Nick, Jesse,

Thanks both for your responses, it's much appreciated! It's very useful to
have a clear pointer to the right place to begin looking.

Regards,
-Tennessee

On Tue, Mar 29, 2011 at 10:47 PM, Jesse Noller  wrote:

> On Tue, Mar 29, 2011 at 7:00 AM, Nick Coghlan  wrote:
> > On Tue, Mar 29, 2011 at 8:01 PM, Tennessee Leeuwenburg
> >  wrote:
> >> PyPy maintains http://speed.pypy.org/, which provides very clear
> information
> >> about the relative performance of PyPy trunk against some version of
> cpython
> >> (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython,
> but
> >> that could easily just be my ignorance speaking.
> >> My interest is that I'm looking at building a benchmarking solution at
> work.
> >> and I can't think of a better way to build something good and general
> than
> >> to try and write something that could potentially be released as open
> source
> >> and be useful to others. As such I thought that benchmarking cpython
> would
> >> be a great use case, but I want to find out as much as I can about how
> >> people currently go about benchmarking Python. Initially I'm just
> looking at
> >> CPU profiling since it's easiest.
> >
> > One of the points coming out of the VM summit at Pycon is actually
> > that we want to create a shared benchmarking site for CPython, PyPy,
> > Jython, IronPython (and possibly Stackless) under the python.org
> > banner (either speed.python.org, or possibly performance.python.org,
> > since we want to do memory profiling as well).
> >
> > speed.pypy.org will be the reference site for this, but Maciej
> > indicated at the VM summit that the code that runs that site needs
> > some improvements before it will really be up to the task of
> > effectively benchmarking multiple targets.
> >
> > So, according to http://speed.pypy.org/about/, the place to start with
> > your benchmarking system would probably be
> > https://github.com/tobami/codespeed.
> >
> > Cheers,
> > Nick.
>
> Essentially echoing what nick said. I'm currently working on getting
> the HW for this together.
>



-- 
--
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .hgignore including site-packages and scripts directories?

2011-03-29 Thread R. David Murray
On Wed, 30 Mar 2011 11:11:45 +1100, Mark Hammond  
wrote:
> I'm wondering if it is a reasonable idea to have .hgignore exclude all 
> files from 'Lib/site-packages' and 'Scripts'?  As I install packages 
> into my source builds, a 'hg status' lists *many* files in both those 
> directories forcing me to scroll up a number of pages to see files which 
> have actually changed.

I hardly ever install things into my source build.  The first time I've
done that, in fact, was to run coverage.  The solution is to add such
directories and/or files to your personal ignore list See the 'ignore'
entry under 'ui' in the hgrc documentation.

--
R. David Murray   http://www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .hgignore including site-packages and scripts directories?

2011-03-29 Thread Mark Hammond

On 30/03/2011 12:09 PM, R. David Murray wrote:

On Wed, 30 Mar 2011 11:11:45 +1100, Mark Hammond  
wrote:

I'm wondering if it is a reasonable idea to have .hgignore exclude all
files from 'Lib/site-packages' and 'Scripts'?  As I install packages
into my source builds, a 'hg status' lists *many* files in both those
directories forcing me to scroll up a number of pages to see files which
have actually changed.


I hardly ever install things into my source build.  The first time I've
done that, in fact, was to run coverage.


Windows doesn't really have an install process integrated into the 
build, so it is probably fairly common there.



The solution is to add such
directories and/or files to your personal ignore list See the 'ignore'
entry under 'ui' in the hgrc documentation.


Yeah - but I was wondering if it could be made more convenient by 
default given the downside seems quite small...


Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .hgignore including site-packages and scripts directories?

2011-03-29 Thread R. David Murray
On Wed, 30 Mar 2011 12:17:05 +1100, Mark Hammond  
wrote:
> On 30/03/2011 12:09 PM, R. David Murray wrote:
> > The solution is to add such
> > directories and/or files to your personal ignore list See the 'ignore'
> > entry under 'ui' in the hgrc documentation.
> 
> Yeah - but I was wondering if it could be made more convenient by 
> default given the downside seems quite small...

I suppose I wouldn't care about site-packages.  Nothing except the
existing README should ever get checked in there, I think.  And I don't
seem to have a 'Scripts' directory, just Tools/scripts, which shouldn't
be ignored.  Is Scripts windows specific?  (I also have a build/scripts,
but build is ignored.)

--
R. David Murray   http://www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .hgignore including site-packages and scripts directories?

2011-03-29 Thread Mark Hammond

On 30/03/2011 1:37 PM, R. David Murray wrote:

On Wed, 30 Mar 2011 12:17:05 +1100, Mark Hammond  
wrote:

On 30/03/2011 12:09 PM, R. David Murray wrote:

The solution is to add such
directories and/or files to your personal ignore list See the 'ignore'
entry under 'ui' in the hgrc documentation.


Yeah - but I was wondering if it could be made more convenient by
default given the downside seems quite small...


I suppose I wouldn't care about site-packages.  Nothing except the
existing README should ever get checked in there, I think.  And I don't
seem to have a 'Scripts' directory, just Tools/scripts, which shouldn't
be ignored.  Is Scripts windows specific?  (I also have a build/scripts,
but build is ignored.)


Yeah, "Scripts" is indeed Windows specific - which I admit I had 
forgotten until a couple of hours ago when debugging why a script using 
virtualenv failed on Windows due to assuming stuff went into a 'bin' 
directory and not the 'Scripts' directory.  The directory is normally 
populated by the distutils 'install' command, easy_install, etc


Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Information about how cpython in benchmarked

2011-03-29 Thread Nick Stinemates
This is really great to hear and something I would be hugely interested in
contributing to.

Lurking has paid off :)

Nick

On Tue, Mar 29, 2011 at 4:00 AM, Nick Coghlan  wrote:

> On Tue, Mar 29, 2011 at 8:01 PM, Tennessee Leeuwenburg
>  wrote:
> > PyPy maintains http://speed.pypy.org/, which provides very clear
> information
> > about the relative performance of PyPy trunk against some version of
> cpython
> > (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, but
> > that could easily just be my ignorance speaking.
> > My interest is that I'm looking at building a benchmarking solution at
> work.
> > and I can't think of a better way to build something good and general
> than
> > to try and write something that could potentially be released as open
> source
> > and be useful to others. As such I thought that benchmarking cpython
> would
> > be a great use case, but I want to find out as much as I can about how
> > people currently go about benchmarking Python. Initially I'm just looking
> at
> > CPU profiling since it's easiest.
>
> One of the points coming out of the VM summit at Pycon is actually
> that we want to create a shared benchmarking site for CPython, PyPy,
> Jython, IronPython (and possibly Stackless) under the python.org
> banner (either speed.python.org, or possibly performance.python.org,
> since we want to do memory profiling as well).
>
> speed.pypy.org will be the reference site for this, but Maciej
> indicated at the VM summit that the code that runs that site needs
> some improvements before it will really be up to the task of
> effectively benchmarking multiple targets.
>
> So, according to http://speed.pypy.org/about/, the place to start with
> your benchmarking system would probably be
> https://github.com/tobami/codespeed.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nstinemates%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Toshio Kuratomi
On Tue, Mar 29, 2011 at 10:55:47PM +0200, Victor Stinner wrote:
> Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit :
> > The lesson here seems to be "if you have to use blacklists, and you
> > use unicode strings for those blacklists, also make sure the string
> > you compare with doesn't have surrogates".
> 
> No. '\u4f60\u597d'.encode('big5').decode('latin1') gives '§A¦n' which
> doesn't contain any surrogate character.
> 
> The lesson is: if you compare Unicode filenames on UNIX, make sure that
> your system is correctly configured (the locale encoding must be the
> filesystem encoding).
>
You're both wrong :-)

Lennart is missing that you just need to use the same encoding
+ surrogateescape (or stick with bytes) for decoding the byte strings that
you are comparing.

You're missing that on UNIX there is no filesystem encoding so the idea of
locale and filesystem encoding matching is false (and unnecessary -- the
encodings that you use within python just need to be the same.  They don't
even need to match up to the reality of what's used on the filesystem or the
user's locale.)

-Toshio


pgpbDIzKAesS3.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Lennart Regebro
On Wed, Mar 30, 2011 at 07:54, Toshio Kuratomi  wrote:
> Lennart is missing that you just need to use the same encoding
> + surrogateescape (or stick with bytes) for decoding the byte strings that
> you are comparing.

You lost me here. I need to do this for what?

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Lennart Regebro
On Tue, Mar 29, 2011 at 23:17, "Martin v. Löwis"  wrote:
> I think the whole blacklist example is artificial. The string in the
> blacklist is actually a Chinese "hello" greeting, so it surely isn't
> the string being blacklisted. For proper blacklisting, you would likely
> use substring searches, case-insensitivity, transliterations, and
> perhaps even regular expressions and word stemming.

Good point.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com