[Python-Dev] Information about how cpython in benchmarked
Hi all, Apologies for emailing this list with such an apparently trivial question. Is there some source of documentation or information on how Python is benchmarked? I am aware of the Python regression testing module, regrtest.py, which I presume, if profiled, would good be a good baseline test. PyPy maintains http://speed.pypy.org/, which provides very clear information about the relative performance of PyPy trunk against some version of cpython (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, but that could easily just be my ignorance speaking. My interest is that I'm looking at building a benchmarking solution at work. and I can't think of a better way to build something good and general than to try and write something that could potentially be released as open source and be useful to others. As such I thought that benchmarking cpython would be a great use case, but I want to find out as much as I can about how people currently go about benchmarking Python. Initially I'm just looking at CPU profiling since it's easiest. Anyway, if this is the wrong place to send this email, I'm very sorry for clogging up your inbox. Thanks very much, -Tennessee ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Information about how cpython in benchmarked
On Tue, Mar 29, 2011 at 8:01 PM, Tennessee Leeuwenburg wrote: > PyPy maintains http://speed.pypy.org/, which provides very clear information > about the relative performance of PyPy trunk against some version of cpython > (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, but > that could easily just be my ignorance speaking. > My interest is that I'm looking at building a benchmarking solution at work. > and I can't think of a better way to build something good and general than > to try and write something that could potentially be released as open source > and be useful to others. As such I thought that benchmarking cpython would > be a great use case, but I want to find out as much as I can about how > people currently go about benchmarking Python. Initially I'm just looking at > CPU profiling since it's easiest. One of the points coming out of the VM summit at Pycon is actually that we want to create a shared benchmarking site for CPython, PyPy, Jython, IronPython (and possibly Stackless) under the python.org banner (either speed.python.org, or possibly performance.python.org, since we want to do memory profiling as well). speed.pypy.org will be the reference site for this, but Maciej indicated at the VM summit that the code that runs that site needs some improvements before it will really be up to the task of effectively benchmarking multiple targets. So, according to http://speed.pypy.org/about/, the place to start with your benchmarking system would probably be https://github.com/tobami/codespeed. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Information about how cpython in benchmarked
On Tue, Mar 29, 2011 at 7:00 AM, Nick Coghlan wrote: > On Tue, Mar 29, 2011 at 8:01 PM, Tennessee Leeuwenburg > wrote: >> PyPy maintains http://speed.pypy.org/, which provides very clear information >> about the relative performance of PyPy trunk against some version of cpython >> (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, but >> that could easily just be my ignorance speaking. >> My interest is that I'm looking at building a benchmarking solution at work. >> and I can't think of a better way to build something good and general than >> to try and write something that could potentially be released as open source >> and be useful to others. As such I thought that benchmarking cpython would >> be a great use case, but I want to find out as much as I can about how >> people currently go about benchmarking Python. Initially I'm just looking at >> CPU profiling since it's easiest. > > One of the points coming out of the VM summit at Pycon is actually > that we want to create a shared benchmarking site for CPython, PyPy, > Jython, IronPython (and possibly Stackless) under the python.org > banner (either speed.python.org, or possibly performance.python.org, > since we want to do memory profiling as well). > > speed.pypy.org will be the reference site for this, but Maciej > indicated at the VM summit that the code that runs that site needs > some improvements before it will really be up to the task of > effectively benchmarking multiple targets. > > So, according to http://speed.pypy.org/about/, the place to start with > your benchmarking system would probably be > https://github.com/tobami/codespeed. > > Cheers, > Nick. Essentially echoing what nick said. I'm currently working on getting the HW for this together. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] utf-8 encoding in checkins?
>> I guess I have my work cut out for me. It appears my preferred mail >> reader, VM, is not supported out-of-the-box by GNU Emacs (they still >> use Rmail and Babyl for some reason), and I'm not sure the investment >> trying to get XEmacs built with MULE is worth the effort. Anders> Use a 21.5 beta of XEmacs instead of 21.4, 21.5 deals with utf-8 Anders> quite well. Thanks for the various responses, both public and private. In part because Barry made the leap back from XEmacs to GNU Emacs (and I trust Barry in all things Emacs), I decided to dip my toe back into the GNU water. I needed to install a recent version of VM, but it does do utf-8, so my original problem is solved. In response to Anders, I had tried 21.5b28 awhile ago but backed off from it. I no longer recall why. My only issues now are: * make sure the ediff and vc packages recognize version-controlled files (It seems they do, but I haven't put them through their paces) * replace the GNU python.el with python-mode.el from the python-mode project (formerly distributed with Python, but now all grown up and moved away). Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] utf-8 encoding in checkins?
s...@pobox.com wrote: I guess I have my work cut out for me. It appears my preferred mail reader, VM, is not supported out-of-the-box by GNU Emacs (they still use Rmail and Babyl for some reason), and I'm not sure the investment trying to get XEmacs built with MULE is worth the effort. Use a 21.5 beta of XEmacs instead of 21.4, 21.5 deals with utf-8 quite well. - Anders ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Proposed change to logging.basicConfig
I'm planning a change to logging.basicConfig to add an optional "handlers" keyword argument which defaults to None. If specified, this should be an iterable of already created handlers, which will be added to the root logger (if it doesn't already have any handlers). Any handler in the iterable which does not have a formatter assigned will be assigned the formatter created by basicConfig. If "handlers" is specified, the "stream", "filename" and "filemode" arguments will be ignored. If any of you can see any problems with this change, or can suggest any improvement to the approach, please respond. I expect to check this change in within the next few days. Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed change to logging.basicConfig
On Tue, 29 Mar 2011 16:35:08 + (UTC) Vinay Sajip wrote: > I'm planning a change to logging.basicConfig to add an optional "handlers" > keyword argument which defaults to None. > > If specified, this should be an iterable of already created handlers, which > will > be added to the root logger (if it doesn't already have any handlers). Any > handler in the iterable which does not have a formatter assigned will be > assigned the formatter created by basicConfig. > > If "handlers" is specified, the "stream", "filename" and "filemode" arguments > will be ignored. > > If any of you can see any problems with this change, or can suggest any > improvement to the approach, please respond. I'm not a logging expert, but the fact that your description above mentions at least two instances of special-casing make it sound like the API has an usability (or learnability) problem. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Security implications of pep 383
Hey all, Not sure how real the security risk is here: http://blog.omega-prime.co.uk/?p=107 Basically he is saying that if you store a list of blacklisted files with names encoded in big-5 (or some other non-utf8 compatible encoding) if those names are passed at the command line, or otherwise read in and decoded from an assumed-utf8 source with surrogate escaping, the surrogate escape decoded names will not match the properly decoded blacklisted names. All the best, Michael Foord -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed change to logging.basicConfig
On 3/29/2011 12:35 PM, Vinay Sajip wrote: I'm planning a change to logging.basicConfig to add an optional "handlers" keyword argument which defaults to None. If specified, this should be an iterable of already created handlers, which will be added to the root logger (if it doesn't already have any handlers). Any handler in the iterable which does not have a formatter assigned will be assigned the formatter created by basicConfig. If "handlers" is specified, the "stream", "filename" and "filemode" arguments will be ignored. If any of you can see any problems with this change, or can suggest any improvement to the approach, please respond. I expect to check this change in within the next few days. I am bothered by mutually exclusive parameters. This is one reason I was glad to see cmp eliminated from list.sort. Quick: what happens if one passes both cmp and key to list.sort? There are three reasonable possibilities. As far as I can read, the answer is not documented.# I am not familiar with logging, but I wonder if you should have two functions for the two quite different signatures. If not, I think the result of passing conflicting parameters should be something like TypeError: conflicting parameters passed. "In the face of ambiguity, refuse to guess." # Experiment with 2.7 shows that cmp wins. Though too late to change, I consider this the worst choice of three. I think an exception should be raised. Failing that, I think key should win on the basis that if one adds a 'new-fangled' key func to an existing call with cmp (and forgets to remove cmp), the key func is the one intended. Also, the doc clearly indicates that key is considered superior to cmp. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
On Tue, 29 Mar 2011 19:23:25 +0100 Michael Foord wrote: > Hey all, > > Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8 compatible encoding) > if those names are passed at the command line, or otherwise read in and > decoded from an assumed-utf8 source with surrogate escaping, the > surrogate escape decoded names will not match the properly decoded > blacklisted names. This has nothing to do specifically with PEP 383. The same issues can arise without PEP 383 if you replace utf-8 with, say, latin-1 in the above example. Basically, what this says is if you are decoding the same bytestring using two different encodings, you get two different unicode strings (which therefore compare unequal). Another observation is that, in the script which is presented, if the user were to extract a filename from the blacklist and call open() on it, they wouldn't actually open one of the blacklisted files, since the encoded representation using the filesystem encoding (e.g. utf-8 or latin-1) would be different from the Big-5 representation. A solution would be to open the blacklist file in binary mode and call os.fsdecode() on the result. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
> Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8 compatible encoding) > if those names are passed at the command line, or otherwise read in and > decoded from an assumed-utf8 source with surrogate escaping, the > surrogate escape decoded names will not match the properly decoded > blacklisted names. As described, I find the problem a little bit artificial: supposedly, he was passing the file name on the command line. However, since his terminal is in UTF-8 and the file name in Big5, the console didn't display the file name in a meaningful way when he ran the program. So whoever ran the program ignored the moji-bake, and didn't wonder whether it could have any effect on proper functioning of the program. In addition, if he did ls(1) on the directory, it would have displayed question marks throughout. This should alert the user that something bad is going on. Notice that this isn't really PEP-383's fault. If the file system encoding was UTF-8, and the blacklist was UTF-8, and the program ran in a Latin-1 locale, it would have decoded the file name nicely (without surrogates), but the blacklist check would still have failed. He should have opened the file in the locale's encoding (i.e. giving no encoding), using the surrogate escape handler. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (2.6): Issue #11639: Configuration function documentation referred to logging.XXX
Le 29/03/2011 02:16, vinay.sajip a écrit : > http://hg.python.org/cpython/rev/bfa2a8d91859 > changeset: 69034:bfa2a8d91859 > branch: 2.6 > parent: 68802:b99c94261225 > user:Vinay Sajip > date:Tue Mar 29 01:07:50 2011 +0100 > summary: > Issue #11639: Configuration function documentation referred to logging.XXX > rather than logging.config.XXX. Only security fixes should go into 2.5 and 2.6. Could you revert (hg backout) this changeset? Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
In a message of Tue, 29 Mar 2011 19:23:25 BST, Michael Foord writes: >Hey all, > >Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > >Basically he is saying that if you store a list of blacklisted files >with names encoded in big-5 (or some other non-utf8 compatible encoding) >if those names are passed at the command line, or otherwise read in and >decoded from an assumed-utf8 source with surrogate escaping, the >surrogate escape decoded names will not match the properly decoded >blacklisted names. >All the best, > >Michael Foord > I am not sure there are any security related gotchas here. All he is saying is that if you decode the same bytestring using two different encodings, you will get two different unicode strings (which therefore will compare unequal). Where's the problem, except in that you might have unrealistic expectations? Laura ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
On Tue, Mar 29, 2011 at 07:23:25PM +0100, Michael Foord wrote: > Hey all, > > Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8 compatible > encoding) if those names are passed at the command line, or otherwise > read in and decoded from an assumed-utf8 source with surrogate > escaping, the surrogate escape decoded names will not match the > properly decoded blacklisted names. > The example is correct. The security risk is real. However, there's a flaw in the program and whether the question of whether there's also a flaw in python is not so certain. Here's the line I'd say is contentious:: blacklist = open("blacklist.big5", encoding='big5').read().split() The blacklist file contains a list of filenames. However, this code treats it as a list of strings. This a logic error in the program, and he should really be doing this:: blacklist = open("blacklist.big5", 'rb').read().split() Then, when comparing it against the values of sys.argv, either sys.argv gets converted into bytes (using the system locale since that's what was used to encode to unicode) or the items in blacklist get converted to unicode with surrogateescape. The possible flaw in python is this: Code like the blog poster wrote passes python3 without an error or a warning. This gives the programmer no feedback that they're doing something wrong until it actually bites them in the foot in deployed code. -Toshio pgpZiD1gfinFR.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed change to logging.basicConfig
Terry Reedy wrote: > I am bothered by mutually exclusive parameters. This is one reason I was > glad to see cmp eliminated from list.sort. Quick: what happens if one > passes both cmp and key to list.sort? There are three reasonable > possibilities. As far as I can read, the answer is not documented.# > # Experiment with 2.7 shows that cmp wins. Though too late to change, I > consider this the worst choice of three. I think an exception should be > raised. Failing that, I think key should win on the basis that if one > adds a 'new-fangled' key func to an existing call with cmp (and forgets > to remove cmp), the key func is the one intended. Also, the doc clearly > indicates that key is considered superior to cmp. Neither 'wins': cmp is applied to the output of key. I agree that it would have been worth documenting this explicitly. -M- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Failed issue tracker submission
The node specified by the designator in the subject of your message ("22663") does not exist. Subject was: "[issue22663]" Mail Gateway Help = Incoming messages are examined for multiple parts: . In a multipart/mixed message or part, each subpart is extracted and examined. The text/plain subparts are assembled to form the textual body of the message, to be stored in the file associated with a "msg" class node. Any parts of other types are each stored in separate files and given "file" class nodes that are linked to the "msg" node. . In a multipart/alternative message or part, we look for a text/plain subpart and ignore the other parts. Summary --- The "summary" property on message nodes is taken from the first non-quoting section in the message body. The message body is divided into sections by blank lines. Sections where the second and all subsequent lines begin with a ">" or "|" character are considered "quoting sections". The first line of the first non-quoting section becomes the summary of the message. Addresses - All of the addresses in the To: and Cc: headers of the incoming message are looked up among the user nodes, and the corresponding users are placed in the "recipients" property on the new "msg" node. The address in the From: header similarly determines the "author" property of the new "msg" node. The default handling for addresses that don't have corresponding users is to create new users with no passwords and a username equal to the address. (The web interface does not permit logins for users with no passwords.) If we prefer to reject mail from outside sources, we can simply register an auditor on the "user" class that prevents the creation of user nodes with no passwords. Actions --- The subject line of the incoming message is examined to determine whether the message is an attempt to create a new item or to discuss an existing item. A designator enclosed in square brackets is sought as the first thing on the subject line (after skipping any "Fwd:" or "Re:" prefixes). If an item designator (class name and id number) is found there, the newly created "msg" node is added to the "messages" property for that item, and any new "file" nodes are added to the "files" property for the item. If just an item class name is found there, we attempt to create a new item of that class with its "messages" property initialized to contain the new "msg" node and its "files" property initialized to contain any new "file" nodes. Triggers Both cases may trigger detectors (in the first case we are calling the set() method to add the message to the item's spool; in the second case we are calling the create() method to create a new node). If an auditor raises an exception, the original message is bounced back to the sender with the explanatory message given in the exception. $Id: mailgw.py,v 1.196 2008-07-23 03:04:44 richard Exp $ Return-Path: X-Original-To: rep...@bugs.python.org Delivered-To: roundup+trac...@psf.upfronthosting.co.za Received: from mail.python.org (mail.python.org [82.94.164.166]) by psf.upfronthosting.co.za (Postfix) with ESMTPS id 7DCEE1DEB0 for ; Tue, 29 Mar 2011 22:10:55 +0200 (CEST) Received: from albatross.python.org (localhost [127.0.0.1]) by mail.python.org (Postfix) with ESMTP id 3PzjsW1q1lz7Lmy for ; Tue, 29 Mar 2011 22:10:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=python.org; s=200901; t=1301429455; bh=WYL3NF6gQtbDZ+R9KxXHGS2PSlCAxyY+EQEgb/Yw5jI=; h=Date:Message-Id:Content-Type:MIME-Version: Content-Transfer-Encoding:From:To:Subject; b=RiMAivS4Shae7bPg7E7SocheqYB9pzk/Svimv+qumX5arnUaaC8h9iIJo8MFDcDdi +Wk0XzTjTjKsbobrKgZnfZf9a8j6Fv4Ym1nTyTcPcyjCMritjq9xNUluVQvHv/Vn2e RhpV2FUWOdCtBx83eUopMPGEEEwABnbG5ZwgsDzM= Received: from localhost (HELO mail.python.org) (127.0.0.1) by albatross.python.org with SMTP; 29 Mar 2011 22:10:55 +0200 Received: from dinsdale.python.org (svn.python.org [IPv6:2001:888:2000:d::a4]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.python.org (Postfix) with ESMTPS for ; Tue, 29 Mar 2011 22:10:55 +0200 (CEST) Received: from localhost ([127.0.0.1] helo=dinsdale.python.org ident=hg) by dinsdale.python.org with esmtp (Exim 4.72) (envelope-from ) id 1Q4fFf-00023G-4C for rep...@bugs.python.org; Tue, 29 Mar 2011 22:10:55 +0200 Date: Tue, 29 Mar 2011 22:10:55 +0200 Message-Id: Content-Type: text/plain; charset="utf8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 From: python-dev@python.org To: rep...@bugs.python.org Subject: [issue22663] TmV3IGNoYW5nZXNldCBkZDg1MmEwZjkyZDYgYnkgZ3VpZG8gaW4gYnJhbmNoICcyLjUnOgpJc3N1 ZSAyMjY2MzogZml4IHJlZGlyZWN0IHZ1bG5lcmFiaWxpdHkgaW4gdXJsbGliL3VybGxpYjIuCmh0 dHA6Ly9oZy5weXRob24ub3JnL2NweXRob24vcmV2L2RkODUyYTBmOTJkNgo= ___
Re: [Python-Dev] Security implications of pep 383
Le mardi 29 mars 2011 à 19:23 +0100, Michael Foord a écrit : > Hey all, > > Not sure how real the security risk is here: > > http://blog.omega-prime.co.uk/?p=107 > > Basically he is saying that if you store a list of blacklisted files > with names encoded in big-5 (or some other non-utf8 compatible encoding) > if those names are passed at the command line, or otherwise read in and > decoded from an assumed-utf8 source with surrogate escaping, the > surrogate escape decoded names will not match the properly decoded > blacklisted names. Yes, if you decode two byte strings from two different encodings, you get different unicode strings. It's not related to surrogateescape (PEP 383). Sorry, '\u4f60\u597d'.encode('big5').decode('latin1') doesn't give you '\u4f60\u597d' but '§A¦n', and it doesn't warn you that latin1 is not big5 (there is no UnicodeEncodeError, even if the error handler is strict). I think that the example has two issues: - security using blacklists doesn't work (it is better to use a whitelist) - if filenames are stored as Big5, they must be decoded from Big5, and so the locale encoding must be Big5 I don't understand the last paragraph: "P.P.S I will further note that you get the same issue even if the blacklist and filename had been in UTF-8, but this time it gets broken from a terminal in the Big5 locale. I didn’t show it this way around because I understand that Python 3 may only have just recently started using the locale to decode argv, rather than being hardcoded to UTF-8." Python filesystem encoding is only hardcoded to UTF-8 on Mac OS X, on other operating systems, it is the locale encoding. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
On Tue, Mar 29, 2011 at 22:40, Lennart Regebro wrote: > The lesson here seems to be "if you have to use blacklists, and you > use unicode strings for those blacklists, also make sure the string > you compare with doesn't have surrogates". > For that matter, what happens with combining characters? '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL LETTER O WITH DIAERESIS}' I guess the filesystem shouldn't treat these as the same (even though they are), but what if some webservice does? I suspect you should normalize both strings before comparing them in any blacklist, and what happens with surrogates when you normalize? //Lennart ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
The lesson here seems to be "if you have to use blacklists, and you use unicode strings for those blacklists, also make sure the string you compare with doesn't have surrogates". //Lennart ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit : > The lesson here seems to be "if you have to use blacklists, and you > use unicode strings for those blacklists, also make sure the string > you compare with doesn't have surrogates". No. '\u4f60\u597d'.encode('big5').decode('latin1') gives '§A¦n' which doesn't contain any surrogate character. The lesson is: if you compare Unicode filenames on UNIX, make sure that your system is correctly configured (the locale encoding must be the filesystem encoding). Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
On Tue, 29 Mar 2011 22:40:01 +0200 Lennart Regebro wrote: > The lesson here seems to be "if you have to use blacklists, and you > use unicode strings for those blacklists, also make sure the string > you compare with doesn't have surrogates". Not really. As everyone said, this can happen even without surrogates. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
Le mardi 29 mars 2011 à 22:45 +0200, Lennart Regebro a écrit : > On Tue, Mar 29, 2011 at 22:40, Lennart Regebro wrote: > > The lesson here seems to be "if you have to use blacklists, and you > > use unicode strings for those blacklists, also make sure the string > > you compare with doesn't have surrogates". > > > > For that matter, what happens with combining characters? > > '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL > LETTER O WITH DIAERESIS}' > > I guess the filesystem shouldn't treat these as the same (even though > they are), but what if some webservice does? Mac OS X does normalize filenames to a variant of the D (decomposed) form. http://www.haypocalc.com/tmp/unicode-2011-03-25/html/operating_systems.html#mac-os-x > I suspect you should normalize both strings before comparing them in any > blacklist, Yes, but a blacklist is not safe: use a whitelist. > and what happens with surrogates when you normalize? Surrogates are not the same in forms N, D, KC and KD. >>> unicodedata.normalize('NFC', '\uDC80') == unicodedata.normalize('NFC', '\uDC80') == unicodedata.normalize('NFKC', '\uDC80') == unicodedata.normalize('NFKD', '\uDC80') == '\uDC80' True Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed change to logging.basicConfig
Antoine Pitrou pitrou.net> writes: > I'm not a logging expert, but the fact that your description above > mentions at least two instances of special-casing make it sound like > the API has an usability (or learnability) problem. Well, basicConfig() was provided to make it as easy as possible to configure logging for the common use cases - logging to console and logging to file. To do this, keyword parameters are passed: stream= (defaulting to sys.stderr, for logging to console) or filename= and filemode= which specify a file to log to. These sets are not compatible. Of course I could have provided two different functions with different signatures, but it seemed simpler to have a single function, which is typically used in just one place in an application script. This is unlike the cmp/key case Terry mentions, and in any case the effect of passing incompatible parameters is documented in the case of basicConfig(). If by special casing you're referring to "(if it doesn't already have any handlers)", that's existing behaviour, and not a change. If you're referring to "the arguments will be ignored" sentence - Terry makes a valid point about raising an exception if incompatible parameters are passed, and I will do this. If you're referring to the sentence about formatters, I don't see it as a special case, it's about convenience. Each logging handler can have a formatter, and the proposed API allows both bespoke formatters to be set for individual handlers and for a common formatter to be set for multiple handlers, with ISTM minimal effort for the API user. If you're referring to something else entirely, I'm not sure what that might be. Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
> '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL > LETTER O WITH DIAERESIS}' > > I guess the filesystem shouldn't treat these as the same (even though > they are), but what if some webservice does? I suspect you should > normalize both strings before comparing them in any blacklist, and > what happens with surrogates when you normalize? I think the whole blacklist example is artificial. The string in the blacklist is actually a Chinese "hello" greeting, so it surely isn't the string being blacklisted. For proper blacklisting, you would likely use substring searches, case-insensitivity, transliterations, and perhaps even regular expressions and word stemming. If you consider all these things, proper or alternative encodings of the same text are just another issue to consider. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Differences among Emacsen (was: utf-8 encoding in checkins?)
s...@pobox.com writes: > My only issues now are: > > * make sure the ediff and vc packages recognize version-controlled files >(It seems they do, but I haven't put them through their paces) The ‘vc’ package (I'm using Debian's GNU Emacs 23.2.1) now recognises DVCS-controlled *files*, and works well with them. It's still unaware that modern VCS deals with project *trees*, so works only at an individual file level. Still quite useful (e.g. ‘vc-diff’ and the like). The ‘dvc’ project http://download.gna.org/dvc/> shows promise, but isn't yet in Debian so I haven't tried it. I just use ‘ediff’ between different working trees on the filesystem, so I don't know how well it works with files that don't exist. > * replace the GNU python.el with python-mode.el from the python-mode >project (formerly distributed with Python, but now all grown up and moved >away). What's the current thinking on that? The native GNU Emacs Python mode seems fine to me, but I'm not a particularly clever Emacs user so am probably missing a whole lot. -- \“Science doesn't work by vote and it doesn't work by | `\authority.” —Richard Dawkins, _Big Mistake_ (The Guardian, | _o__) 2006-12-27) | Ben Finney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Differences among Emacsen (was: utf-8 encoding in checkins?)
On Mar 30, 2011, at 09:20 AM, Ben Finney wrote: >The ‘vc’ package (I'm using Debian's GNU Emacs 23.2.1) now recognises >DVCS-controlled *files*, and works well with them. It's still unaware >that modern VCS deals with project *trees*, so works only at an >individual file level. Still quite useful (e.g. ‘vc-diff’ and the like). > >The ‘dvc’ project http://download.gna.org/dvc/> shows promise, but >isn't yet in Debian so I haven't tried it. That'll be interesting to track. I do something probably similar to you: I diff inside Emacs but always $vcs commit from a shell. I vaguely remember some ancient way of doing tree-based commits using svn in Emacs, but those brain cells have long been recycled. >I just use ‘ediff’ between different working trees on the filesystem, so >I don't know how well it works with files that don't exist. One thing I miss so far with Mercurial is 'smerge' mode. When I have a conflict in a Bazaar branch, I can just visit the conflicting file and Emacs dumps me in a minor mode called smerge. This gives me key bindings to hop around between conflict areas, select which part I want (or both) and then automatically calls $vcs resolve when all the conflict areas are taken care of. Makes it *very* nice for dealing with such things, but in the one case I tried it with Mercurial conflicts, it didn't work. I'll have to investigate further, but I'm guessing it's caused by some incompatibility with hg's conflict markers. >> * replace the GNU python.el with python-mode.el from the python-mode >>project (formerly distributed with Python, but now all grown up and moved >>away). > >What's the current thinking on that? The native GNU Emacs Python mode >seems fine to me, but I'm not a particularly clever Emacs user so am >probably missing a whole lot. In case you missed it, there are now *three* Python modes. Tim Peters' original and best (in my completely unbiased opinion ) python-mode.el which is still being developed, the older but apparently removed from Emacs python.el and the 'new' (so I've heard) python.el. Since Skip and I work on python-mode.el, you can tell what our preference is. The fact that it hasn't been pulled into Emacs is a long and dark political tale full of intrigue, subterfuge, fast cars, Matt Damon, sharks with frickin' laser beams attached to their heads, and downright redonkulousness. If you want the full gory details (or just want to help make the most awesome Python editing mode even awesomer), come join us on python-m...@python.org. -Barry P.S. pdbtrack signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] cmp= & key= (Re: Proposed change to logging.basicConfig)
On 3/29/2011 4:02 PM, Matthew Woodcraft wrote: Terry Reedy wrote: # Experiment with 2.7 shows that cmp wins. Though too late to change, I consider this the worst choice of three. I think an exception should be raised. Failing that, I think key should win on the basis that if one adds a 'new-fangled' key func to an existing call with cmp (and forgets to remove cmp), the key func is the one intended. Also, the doc clearly indicates that key is considered superior to cmp. Neither 'wins': cmp is applied to the output of key. Added to http://bugs.python.org/issue11712 (for 2.7 only ;-) I agree that it would have been worth documenting this explicitly. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
On 3/29/2011 2:23 PM, Michael Foord wrote: Not sure how real the security risk is here: http://blog.omega-prime.co.uk/?p=107 Basically he is saying that if you store a list of blacklisted files with names encoded in big-5 (or some other non-utf8 compatible encoding) if those names are passed at the command line, or otherwise read in and decoded from an assumed-utf8 source with surrogate escaping, the surrogate escape decoded names will not match the properly decoded blacklisted names. I posted link to this as comment, with my summary of thread. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] .hgignore including site-packages and scripts directories?
I'm wondering if it is a reasonable idea to have .hgignore exclude all files from 'Lib/site-packages' and 'Scripts'? As I install packages into my source builds, a 'hg status' lists *many* files in both those directories forcing me to scroll up a number of pages to see files which have actually changed. IIUC, listing a directory in .hgignore doesn't preclude files from that directory being added to hg, and doesn't prevent files in those directories already under hg from being detected as changed. The only downside I can see if that if new files are added to those directories which should be added to hg, a simple "hg st" will not show it - someone must remember and explicitly add it. However, ISTM those files are already likely to be missed given the large amount of noise 'hg st' shows in that directory - the files are likely to be in the middle of a very long list which my brain will be trained to habitually skip over. The number of new files which legitimately need to be added to those directories seem so small that this risk seems worthwhile. Any thoughts? Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Information about how cpython in benchmarked
Hi Nick, Jesse, Thanks both for your responses, it's much appreciated! It's very useful to have a clear pointer to the right place to begin looking. Regards, -Tennessee On Tue, Mar 29, 2011 at 10:47 PM, Jesse Noller wrote: > On Tue, Mar 29, 2011 at 7:00 AM, Nick Coghlan wrote: > > On Tue, Mar 29, 2011 at 8:01 PM, Tennessee Leeuwenburg > > wrote: > >> PyPy maintains http://speed.pypy.org/, which provides very clear > information > >> about the relative performance of PyPy trunk against some version of > cpython > >> (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, > but > >> that could easily just be my ignorance speaking. > >> My interest is that I'm looking at building a benchmarking solution at > work. > >> and I can't think of a better way to build something good and general > than > >> to try and write something that could potentially be released as open > source > >> and be useful to others. As such I thought that benchmarking cpython > would > >> be a great use case, but I want to find out as much as I can about how > >> people currently go about benchmarking Python. Initially I'm just > looking at > >> CPU profiling since it's easiest. > > > > One of the points coming out of the VM summit at Pycon is actually > > that we want to create a shared benchmarking site for CPython, PyPy, > > Jython, IronPython (and possibly Stackless) under the python.org > > banner (either speed.python.org, or possibly performance.python.org, > > since we want to do memory profiling as well). > > > > speed.pypy.org will be the reference site for this, but Maciej > > indicated at the VM summit that the code that runs that site needs > > some improvements before it will really be up to the task of > > effectively benchmarking multiple targets. > > > > So, according to http://speed.pypy.org/about/, the place to start with > > your benchmarking system would probably be > > https://github.com/tobami/codespeed. > > > > Cheers, > > Nick. > > Essentially echoing what nick said. I'm currently working on getting > the HW for this together. > -- -- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] .hgignore including site-packages and scripts directories?
On Wed, 30 Mar 2011 11:11:45 +1100, Mark Hammond wrote: > I'm wondering if it is a reasonable idea to have .hgignore exclude all > files from 'Lib/site-packages' and 'Scripts'? As I install packages > into my source builds, a 'hg status' lists *many* files in both those > directories forcing me to scroll up a number of pages to see files which > have actually changed. I hardly ever install things into my source build. The first time I've done that, in fact, was to run coverage. The solution is to add such directories and/or files to your personal ignore list See the 'ignore' entry under 'ui' in the hgrc documentation. -- R. David Murray http://www.bitdance.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] .hgignore including site-packages and scripts directories?
On 30/03/2011 12:09 PM, R. David Murray wrote: On Wed, 30 Mar 2011 11:11:45 +1100, Mark Hammond wrote: I'm wondering if it is a reasonable idea to have .hgignore exclude all files from 'Lib/site-packages' and 'Scripts'? As I install packages into my source builds, a 'hg status' lists *many* files in both those directories forcing me to scroll up a number of pages to see files which have actually changed. I hardly ever install things into my source build. The first time I've done that, in fact, was to run coverage. Windows doesn't really have an install process integrated into the build, so it is probably fairly common there. The solution is to add such directories and/or files to your personal ignore list See the 'ignore' entry under 'ui' in the hgrc documentation. Yeah - but I was wondering if it could be made more convenient by default given the downside seems quite small... Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] .hgignore including site-packages and scripts directories?
On Wed, 30 Mar 2011 12:17:05 +1100, Mark Hammond wrote: > On 30/03/2011 12:09 PM, R. David Murray wrote: > > The solution is to add such > > directories and/or files to your personal ignore list See the 'ignore' > > entry under 'ui' in the hgrc documentation. > > Yeah - but I was wondering if it could be made more convenient by > default given the downside seems quite small... I suppose I wouldn't care about site-packages. Nothing except the existing README should ever get checked in there, I think. And I don't seem to have a 'Scripts' directory, just Tools/scripts, which shouldn't be ignored. Is Scripts windows specific? (I also have a build/scripts, but build is ignored.) -- R. David Murray http://www.bitdance.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] .hgignore including site-packages and scripts directories?
On 30/03/2011 1:37 PM, R. David Murray wrote: On Wed, 30 Mar 2011 12:17:05 +1100, Mark Hammond wrote: On 30/03/2011 12:09 PM, R. David Murray wrote: The solution is to add such directories and/or files to your personal ignore list See the 'ignore' entry under 'ui' in the hgrc documentation. Yeah - but I was wondering if it could be made more convenient by default given the downside seems quite small... I suppose I wouldn't care about site-packages. Nothing except the existing README should ever get checked in there, I think. And I don't seem to have a 'Scripts' directory, just Tools/scripts, which shouldn't be ignored. Is Scripts windows specific? (I also have a build/scripts, but build is ignored.) Yeah, "Scripts" is indeed Windows specific - which I admit I had forgotten until a couple of hours ago when debugging why a script using virtualenv failed on Windows due to assuming stuff went into a 'bin' directory and not the 'Scripts' directory. The directory is normally populated by the distutils 'install' command, easy_install, etc Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Information about how cpython in benchmarked
This is really great to hear and something I would be hugely interested in contributing to. Lurking has paid off :) Nick On Tue, Mar 29, 2011 at 4:00 AM, Nick Coghlan wrote: > On Tue, Mar 29, 2011 at 8:01 PM, Tennessee Leeuwenburg > wrote: > > PyPy maintains http://speed.pypy.org/, which provides very clear > information > > about the relative performance of PyPy trunk against some version of > cpython > > (presumably 2.6 or 2.7). I'm not aware of a similar site for cpython, but > > that could easily just be my ignorance speaking. > > My interest is that I'm looking at building a benchmarking solution at > work. > > and I can't think of a better way to build something good and general > than > > to try and write something that could potentially be released as open > source > > and be useful to others. As such I thought that benchmarking cpython > would > > be a great use case, but I want to find out as much as I can about how > > people currently go about benchmarking Python. Initially I'm just looking > at > > CPU profiling since it's easiest. > > One of the points coming out of the VM summit at Pycon is actually > that we want to create a shared benchmarking site for CPython, PyPy, > Jython, IronPython (and possibly Stackless) under the python.org > banner (either speed.python.org, or possibly performance.python.org, > since we want to do memory profiling as well). > > speed.pypy.org will be the reference site for this, but Maciej > indicated at the VM summit that the code that runs that site needs > some improvements before it will really be up to the task of > effectively benchmarking multiple targets. > > So, according to http://speed.pypy.org/about/, the place to start with > your benchmarking system would probably be > https://github.com/tobami/codespeed. > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/nstinemates%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
On Tue, Mar 29, 2011 at 10:55:47PM +0200, Victor Stinner wrote: > Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit : > > The lesson here seems to be "if you have to use blacklists, and you > > use unicode strings for those blacklists, also make sure the string > > you compare with doesn't have surrogates". > > No. '\u4f60\u597d'.encode('big5').decode('latin1') gives '§A¦n' which > doesn't contain any surrogate character. > > The lesson is: if you compare Unicode filenames on UNIX, make sure that > your system is correctly configured (the locale encoding must be the > filesystem encoding). > You're both wrong :-) Lennart is missing that you just need to use the same encoding + surrogateescape (or stick with bytes) for decoding the byte strings that you are comparing. You're missing that on UNIX there is no filesystem encoding so the idea of locale and filesystem encoding matching is false (and unnecessary -- the encodings that you use within python just need to be the same. They don't even need to match up to the reality of what's used on the filesystem or the user's locale.) -Toshio pgpbDIzKAesS3.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
On Wed, Mar 30, 2011 at 07:54, Toshio Kuratomi wrote: > Lennart is missing that you just need to use the same encoding > + surrogateescape (or stick with bytes) for decoding the byte strings that > you are comparing. You lost me here. I need to do this for what? //Lennart ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Security implications of pep 383
On Tue, Mar 29, 2011 at 23:17, "Martin v. Löwis" wrote: > I think the whole blacklist example is artificial. The string in the > blacklist is actually a Chinese "hello" greeting, so it surely isn't > the string being blacklisted. For proper blacklisting, you would likely > use substring searches, case-insensitivity, transliterations, and > perhaps even regular expressions and word stemming. Good point. //Lennart ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com