Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread And Clover
On Tue, 2011-01-04 at 03:44 +0100, Victor Stinner wrote:
> What is this horrible encoding "bytes-as-unicode"?

It is a unicode string decoded from bytes using ISO-8859-1. ISO-8859-1
is the encoding specified by the HTTP RFC, as well as having the happy
property of preserving every input byte. PEP  requires it.

> os.environ is supposed to be correctly decoded and contain valid
unicode characters.

It is not possible to ‘correctly’ decode to unicode for os.environ
because that decoding happens long before the web application (the
only party that knows what encoding should be in use) gets a look in.

Maybe the web application is using UTF-8, maybe it's using cp1252,
but if we let the server/gateway decide and do that decoding
before the application can do anything about it, we will get the wrong
encoding in *many* cases and the result will be permanent, unrecoverable
mangling of non-ASCII characters in submitted headers.

> If WSGI uses another encoding than the locale encoding (which is a bad
idea),

It's an absolutely necessary idea. The locale encoding is nothing to do
with the web application's encoding. Windows applications need to be
able to use UTF-8 (which is never the ANSI code page), and web
applications in general need to be deployable to any server without
having to worry about the server's locale.

The locale-dependent status quo is that non-ASCII characters in URL
paths and other HTTP headers don't work for Python apps.

The recoding dances present in wsgiref's CGIHandler for 3.2 are
distasteful but completely necessary to normalise differences in
encodings used by various servers and platforms to generate their CGI
environment.

>  it should use os.environb and decodes keys and values using its
> own encoding.

Well yes, but:

(a) os.environb doesn't exist in previous Python 3.1, making it
impossible to implement WSGI before 3.2;
(b) a byte environment on Windows would have to be encoded
from the Unicode environment, with a server-specific encoding,
and then what encoding are you going to choose for the variables
that contain non-HTTP-sourced native Unicode strings (such as,
very commonly, Windows pathnames)?

The bytes-or-bytes-in-Unicode argument is something that has been
bounced around Web-SIG for literally *years*; this is what we ended up
with. Although I personally like bytes, frankly, a re-run of this
argument *again* whilst WSGI remains in perpetual stalemate does not
appeal. WSGI and wsgiref in Python 3.0-3.1 simply does not work. This
has long been an embarrassing situation for what is supposed to be a
leading
web language. Let us not perpetuate this sorry story to 3.2 as well.

-- 
And Clover
mailto:a...@doxdesk.com http://www.doxdesk.com
skype:uknrbobince gtalk:chat?jid=bobi...@gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread And Clover
On Tue, 2011-01-04 at 03:44 +0100, Victor Stinner wrote:
> What is this horrible encoding "bytes-as-unicode"?

It is a unicode string decoded from bytes using ISO-8859-1. ISO-8859-1
is the encoding specified by the HTTP RFC, as well as having the happy
property of preserving every input byte.

> os.environ is supposed to be correctly decoded and contain valid unicode 
> characters.

Nope. It is not possible to ‘correctly’ decode to unicode for os.environ
because that decoding happens long before the web application gets a
look in. Maybe the web application is using UTF-8, maybe it's using
cp1252, but if we let the server/gateway decide and do that decoding
before the application can do anything about it, we will get the wrong
encoding in *many* cases and the result will be permanent, unrecoverable
mangling of non-ASCII characters in submitted headers.

> If WSGI uses another encoding than the locale encoding (which is a bad idea),

It's an absolutely necessary idea. The locale encoding is nothing to do
with the web application's encoding. Windows applications need to be
able to use UTF-8 (which is never the ANSI code page), and web
applications in general need to be deployable to any server without
having to worry about the server's locale.

The locale-dependent status quo is that non-ASCII characters in URL
paths and other HTTP headers don't work for Python apps.

The recoding dances present in wsgiref's CGIHandler for 3.2 are
distasteful but completely necessary to normalise differences in
encodings used by various servers and platforms to generate their CGI
environment.

>  it should use os.environb and decodes keys and values using its
> own encoding.

Well yes, but:

(a) os.environb doesn't exist in previous Python 3.1, making it
impossible to implement WSGI before 3.2;
(b) there are also non-HTTP-related environment variables, which may
contain native Unicode strings (eg, very commonly, Windows pathnames),
so you have to have both environ *and* environb.

The bytes-or-bytes-in-Unicode argument is something that has been
bounced around Web-SIG for literally *years*; this is what we ended up
with. Although I personally like bytes, frankly, a re-run of this
argument *again* whilst WSGI remains in perpetual stalemate does not
appeal. WSGI and wsgiref in Python 3.0-3.1 simply not work at all. This
has been an embarrassing situation for what is supposed to be a leading
web language. Let's not perpetuate this sorry story to 3.2 as well.

-- 
And Clover
mailto:a...@doxdesk.com http://www.doxdesk.com
skype:uknrbobince gtalk:chat?jid=bobi...@gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-13 Thread And Clover

On 2012-01-13 11:20, Lennart Regebro wrote:

The vulnerability is basically only in the dictionary you keep the
form data you get from a request.


I'd have to disagree with this statement. The vulnerability is anywhere 
that creates a dictionary (or set) from attacker-provided keys. That 
would include HTTP headers, RFC822-family subheaders and parameters, the 
environ, input taken from JSON or XML, and so on - and indeed hash 
collision attacks are not at all web-specific.


The problem with having two dict implementations is that a caller would 
have to tell libraries that use dictionaries which implementation to 
use. So for example an argument would have to be passed to json.load[s] 
to specify whether the input was known-sane or potentially hostile.


Any library could ever use dictionaries to process untrusted input *or 
any library that used another library that did* would have to pass such 
a flag through, which would quickly get very unwieldy indeed... or else 
they'd have to just always use safedict, in which case we're in pretty 
much the same position as we are with changing dict anyway.


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
gtalk:chat?jid=bobi...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a new "locale" codec?

2012-02-08 Thread And Clover

On 2012-02-08 09:28, Simon Cross wrote:

I think I'm -1 on a "locale" encoding because it refers to different
actual encodings depending on where and when it's run, which seems
surprising, and there's already a more explicit way to achieve the
same effect.


I'd agree that this is undesirable, and I don't really want 
locale-specific behaviour to leak out in other places that accept a 
encoding name (eg ), but we already have this 
behaviour with the "mbcs" encoding on Windows which refers to the 
locale-specific 'ANSI' code page.


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
gtalk:chat?jid=bobi...@doxdesk.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] c/ElementTree XML serialisation

2012-05-08 Thread And Clover

On 08/05/12 17:21, Alex Leach wrote:
> The w3c SVG specification / recommendation
>   allows for