[issue13703] Hash collision security issue

2012-01-05 Thread Glenn Linderman

Changes by Glenn Linderman :


--
nosy: +v+python

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-07 Thread Glenn Linderman

Glenn Linderman  added the comment:

Given Martin's comment (msg150832) I guess I should add my suggestion to this 
issue, at least for the record.

Rather than change hash functions, randomization could be added to those dicts 
that are subject to attack by wanting to store user-supplied key values.  The 
list so far seems to be   urllib.parse, cgi, json  Some have claimed there are 
many more, but without enumeration.  These three are clearly related to the 
documented issue.

The technique would be to wrap dict and add a short random prefix to each key 
value, preventing the attacker from supplier keys that are known to collide... 
and even if he successfully stumbles on a set that does collide on one request, 
it is unlikely to collide on a subsequent request with a different prefix 
string.

The technique is fully backward compatible with all applications except those 
that contain potential vulnerabilities as described by the researchers. The 
technique adds no startup or runtime overhead to any application that doesn't 
contain the potential vulnerabilities.  Due to the per-request randomization, 
the complexity of creating a sequence of sets of keys that may collide is 
enormous, and requires that such a set of keys happen to arrive on a request in 
the right sequence where the predicted prefix randomization would be used to 
cause the collisions to occur.  This might be possible on a lightly loaded 
system, but is less likely on a system with heavy load, which are more 
interesting to attack.

Serhiy Storchaka provided a sample implementation on the python-dev, copied 
below, and attached as a file (but is not a patch).

# -*- coding: utf-8 -*-
from collections import MutableMapping
import random


class SafeDict(dict, MutableMapping):

def __init__(self, *args, **kwds):
dict.__init__(self)
self._prefix = str(random.getrandbits(64))
self.update(*args, **kwds)

def clear(self):
dict.clear(self)
self._prefix = str(random.getrandbits(64))

def _safe_key(self, key):
return self._prefix + repr(key), key

def __getitem__(self, key):
try:
return dict.__getitem__(self, self._safe_key(key))
except KeyError as e:
e.args = (key,)
raise e

def __setitem__(self, key, value):
dict.__setitem__(self, self._safe_key(key), value)

def __delitem__(self, key):
try:
dict.__delitem__(self, self._safe_key(key))
except KeyError as e:
e.args = (key,)
raise e

def __iter__(self):
for skey, key in dict.__iter__(self):
yield key

def __contains__(self, key):
return dict.__contains__(self, self._safe_key(key))

setdefault = MutableMapping.setdefault
update = MutableMapping.update
pop = MutableMapping.pop
popitem = MutableMapping.popitem
keys = MutableMapping.keys
values = MutableMapping.values
items = MutableMapping.items

def __repr__(self):
return '{%s}' % ', '.join('%s: %s' % (repr(k), repr(v))
for k, v in self.items())

def copy(self):
return self.__class__(self)

@classmethod
def fromkeys(cls, iterable, value=None):
d = cls()
for key in iterable:
d[key] = value
return d

def __eq__(self, other):
return all(k in other and other[k] == v for k, v in self.items()) and \
all(k in self and self[k] == v for k, v in other.items())

def __ne__(self, other):
return not self == other

--
Added file: http://bugs.python.org/file24169/SafeDict.py

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-07 Thread Glenn Linderman

Glenn Linderman  added the comment:

Alex, I agree the issue has to do with the origin of the data, but the modules 
listed are the ones that deal with the data supplied by this particular attack.

Note that changing the hash algorithm for a persistent process, even though 
each process may have a different seed or randomized source, allows attacks for 
the life of that process, if an attack vector can be created during its 
lifetime. This is not a problem for systems where each request is handled by a 
different process, but is a problem for systems where processes are 
long-running and handle many requests.

Regarding vulnerable user code, supplying SafeDict (or something similar) in 
the stdlib or as sample code for use in such cases allows user code to be fixed 
also.

You have entered the class of people that claim lots of vulnerabilities, 
without enumeration.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-07 Thread Glenn Linderman

Glenn Linderman  added the comment:

[offlist]
Paul, thanks for the enumeration and response.  Some folks have more 
experience, but the rest of us need to learn.  Having the proposal in 
the ticket, with an explanation of its deficiencies is not all bad, 
however, others can learn, perhaps.  On the other hand, I'm willing to 
learn more, if you are willing to address my concerns below.

I had read the whole thread and issue, but it still seemed like a leap 
of faith to conclude that the only, or at least best, solution is 
changing the hash.  Yet, changing the hash still doesn't seem like a 
sufficient solution, due to long-lived processes.

On 1/7/2012 6:40 PM, Paul McMillan wrote:
> Paul McMillan  added the comment:
>
>> Alex, I agree the issue has to do with the origin of the data, but the 
>> modules listed are the ones that deal with the data supplied by this 
>> particular attack.
> They deal directly with the data. Do any of them pass the data
> further, or does the data stop with them?

For web forms and requests, which is the claimed vulnerability, I would 
expect that most of them do not pass the data further, without 
validation or selection, and it is unlikely that the form is actually 
expecting data with colliding strings, so it seems very unlikely that 
they would be passed on. At least that is how I code my web apps: just 
select the data I expect from my form.  At present I do not reject data 
I do not expect, but I'll have to consider either using SafeDict (which 
I can start using ASAP, not waiting for a new release of Perl to be 
installed on my Web Server (currently running Perl 2.4), or rejecting 
data I do not expect prior to putting it in a dict.  That might require 
tweaking urllib.parse a bit, or cgi, or both.

> A short and very incomplete
> list of vulnerable standard lib modules includes: every single parsing
> library (json, xml, html, plus all the third party libraries that do
> that), all of numpy (because it processes data which probably came
> from a user [yes, integers can trigger the vulnerability]), difflib,
> the math module, most database adaptors, anything that parses metadata
> (including commonly used third party libs like PIL), the tarfile lib
> along with other compressed format handlers, the csv module,
> robotparser, plistlib, argparse, pretty much everything under the
> heading of "18. Internet Data Handling" (email, mailbox, mimetypes,
> etc.), "19. Structured Markup Processing Tools", "20. Internet
> Protocols and Support", "21. Multimedia Services", "22.
> Internationalization", TKinter, and all the os calls that handle
> filenames. The list is impossibly large, even if we completely ignore
> user code. This MUST be fixed at a language level.
>
> I challenge you to find me 15 standard lib components that are certain
> to never handle user-controlled input.

I do appreciate your enumeration, but I'll decline the challenge.  While 
all of them can be interesting exploits of naïve applications (written 
by programmers who may be quite experienced in some things, but can 
naïvely overlook other things), most of them probably do not apply to 
the documented vulnerability. Many I had thought of, but rejected for 
this context; some I had not.  So while there are many possible 
situations where happily stuffing things into a dict may be an easy 
solution, there are many possible cases where it should be prechecked on 
the way in.  And there is another restriction: if the user-controlled 
input enters a user-run program, it is unlikely to be attacked in the 
same manner than web servers are attacked.  A user, for example, is 
unlikely to contrive colliding file names for the purpose of making his 
file listing program run slow.

So it is really system services and web services that need to be 
particularly careful. Randomizing the hash seed might reduce the problem 
from any system/web services to only long-running system/web services, 
but doesn't really solve the complete problem, as far as I can tell... 
only proper care in writing the application (and the stdlib code) will 
solve the complete problem.  Sadly, beefing up the stdlib code will 
probably reduce performance for things that will not be exploited to be 
careful enough in the cases that could be exploited.

>> Note that changing the hash algorithm for a persistent process, even though 
>> each process may have a different seed or randomized source, allows attacks 
>> for the life of that process, if an attack vector can be created during its 
>> lifetime. This is not a problem for systems where each request is handled by 
>> a different process, but is a problem for systems where processes are 
>> long-running and handle many requests.
> This point has been made many times now. I urge you to read the

[issue13703] Hash collision security issue

2012-01-07 Thread Glenn Linderman

Glenn Linderman  added the comment:

I don't find a way to delete my prior comment, so I'll add one more 
(only). The prior comment was intended to go to one person, but I didn't 
notice the From, having one person's name, actually went back to the 
ticket (the email address not being for that individual), now I do, so 
I've learned that.

My prior comment was a request for further explanation of things I still 
don't understand, not intended to be an attack.  If someone can delete 
both this and my prior comment from the issue, or tell me how, feel free.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-24 Thread Glenn Linderman

Glenn Linderman  added the comment:

In msg142098  Ezio said:
> Keep in mind that we should be able to access and use lone surrogates too, 
> therefore:
> s = '\ud800'  # should be valid
> len(s)  # should this raise an error? (or return 0.5 ;)?

I say:
For streams and data types in which lone surrogates are permitted, a lone 
surrogate should be treated as and counted as a character (codepoint).

For streams and data types in which lone surrogates are not permitted, the 
assigned should be invalid, and raise an error; len would then never see it, 
and has no quandary.

--
nosy: +v+python

___
Python tracker 
<http://bugs.python.org/issue12729>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11269] cgi.FieldStorage forgets to unquote field names when parsing multipart/form-data

2011-02-25 Thread Glenn Linderman

Glenn Linderman  added the comment:

Just some comments for the historical record:
During the discussion of issue 4953 research and testing revealed that browsers 
send back their cgi data using the same charset as the page that they are 
responding to.  So the only way that quoting would be necessary on field names 
would be if they were quoted funny, as in your example here.  It is somewhat 
unlikely that people would go to the trouble of coding field names that contain 
" and ' and % characters, just to mess themselves up (which ones do that, 
depend on which quote character is used for the name in the HTML and whether 
the enctype is "multipart/form-data" or URL encoding).

And Firefox 3.6... provides

name=""%22"

and that presently works with Python 3.2 CGI!  But that might mean that for 
Firefox 4.x, providing the "\"%22", CGI might pass through the "\"?  And 
really, the dequoting must be incorrectly coded for the Firefox 3.6 to "work".

--
nosy: +v+python

___
Python tracker 
<http://bugs.python.org/issue11269>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11269] cgi.FieldStorage forgets to unquote field names when parsing multipart/form-data

2011-02-25 Thread Glenn Linderman

Glenn Linderman  added the comment:

Sergey says:
I wanted to add that the fact that browsers encode the field names in the page 
encoding does not change that they should escape the header according to RFC 
2047.

I respond:
True, but RFC 2047 is _so_ weird, that it seems that browsers have a better 
solution.  RFC 2047 is needed for 7-bit mail servers, from which it seems to 
have been inherited by specs for HTTP (but I've never seen it used by a 
browser, have you?).  It would be nicer if HTTP had a header that allowed 
definition of the charset used for subsequent headers.  Right now, the code 
processing form data has to assume a particular encoding for headers & data, 
and somehow make sure that all the s that use the same code have the same 
encoding.

Sergey says:
I imagine there could be a non-ASCII field name that, when encoded in some 
encoding, will produce something SQL-injection-like: '"; other="xx"'. That 
string would make the header parse into something completely different. With 
IE8 and FF 3.6 it looks like it would be very simple. The same applies to 
uploaded files names too, so it's not just a  matter of choosing sane field 
names.

That's all a browsers' problem though.

I respond:
Perhaps there is, although it depends on how the parser is written what 
injection techniques would work, and it also depends on having a followon 
parameter with dangerous semantics to incorrectly act on.

It isn't just a problem for the browsers, but for every CGI script that parses 
such parameters.

--

___
Python tracker 
<http://bugs.python.org/issue11269>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1271] Raw string parsing fails with backslash as last character

2011-03-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

I can certainly agree with the opinion that raw strings are working as 
documented, but I can also agree with the opinion that they contain traps for 
the unwary, and after getting trapped several times, I have chosen to put up 
with the double-backslash requirement of regular strings, and avoid the use of 
raw strings in my code.  The double-backslash requirement of regular strings 
gets ugly for Windows pathnames and some regular expressions, but the traps of 
raw strings are more annoying that that.

I'm quite sure it would be impossible to "fix" raw strings without causing 
deprecation churn for people to whom they are useful (if there are any such; 
hard for me to imagine, but I'm sure there are).

I'm quite sure the only reasonable "fix" would be to invent a new type of 
"escape-free" or "exact" string (to not overuse the term raw, and make two 
types of raw string).  With Python 3, and UTF-8 source files, there is little 
need for \-prefixed characters (and there is already a string syntax that 
permits them, when they are needed), so it seems like inventing a new string 
syntax

e'string'
e"""string"""

which would not treat \ in any special manner whatsoever, would be useful for 
all the cases raw strings are presently useful for, and even more useful, 
because it would handle all the cases that are presently traps for the unwary 
that raw-strings have.

The problem mention in this thread of escaping the outer quote character is 
much more appropriately handled by the triple-quote form.  I don't know the 
Python history well enough to know if raw strings predated triple-quote; if 
they didn't, there is would have been no need for raw strings to attempt to 
support such.

--
nosy: +v+python

___
Python tracker 
<http://bugs.python.org/issue1271>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1271] Raw string parsing fails with backslash as last character

2011-03-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

@Graham: seems like the two primary gotchas are trailing \ and \" \' not 
removing the \.  The one that tripped me up the most was the trailing \, but I 
did get hit with \" once.  Probably if Python had been my first programming 
language that used \ escapes, it wouldn't be such a problem, but since it never 
will be, and because I'm forced to use the others from time to time, still, 
learning yet a different form of "not-quite raw" string just isn't worth my 
time and debug efforts.  When I first saw it, it sounded more useful than 
doubling the \s like I do elsewhere, but after repeated trip-ups with trailing 
\, I decided it wasn't for me.

@R David: Interesting description of the parsing/escaping.  Sounds like that 
makes for a cheap parser, fewer cases to handle.  But there is little that is 
hard about escape-free or exact string parsing: just look for the trailing " ' 
""" or ''' that matches the one at the beginning.  The only thing difficult is 
if you want to escape the quote, but with the rich set of quotes available, it 
is extremely unlikely that you can't find one that you can use, except perhaps 
if you are writing a parser for parsing Python strings, in which case, the 
regular expression that matches any leading quote could be expressed as:

'("|"""|' "'|''')"

Granted that isn't the clearest syntax in the world, but it is also uncommon, 
and can be assigned to a nicely named variable such as matchLeadingQuotationRE 
in one place, and used wherever needed.

Regarding the use of / rather that \ that is true if you are passing file names 
to Windows APIs, but not true if you are passing them to other programs that 
use / as option syntax and \ as path separator (most Windows command line 
utilities).

--

___
Python tracker 
<http://bugs.python.org/issue1271>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1271] Raw string parsing fails with backslash as last character

2011-03-12 Thread Glenn Linderman

Glenn Linderman  added the comment:

On 3/12/2011 7:11 PM, R. David Murray wrote:
> R. David Murray  added the comment:
>
> I've opened issue 11479 with a proposed patch to the tutorial along the lines 
> suggested by Graham.

Which is good, for people that use the tutorial.  I jump straight to the 
reference guide, usually, because of so many years of experience with 
other languages.  But I was surprised you used .strip() instead of [:-1] 
which is shorter and I would expect it to be more efficient also.

--
Added file: http://bugs.python.org/file21098/unnamed

___
Python tracker 
<http://bugs.python.org/issue1271>
___

  

  
  
On 3/12/2011 7:11 PM, R. David Murray wrote:

  
R. David Murray mailto:rdmur...@bitdance.com";><rdmur...@bitdance.com> added the 
comment:

I've opened issue 11479 with a proposed patch to the tutorial along the lines 
suggested by Graham.



Which is good, for people that use the tutorial.  I jump straight to
the reference guide, usually, because of so many years of experience
with other languages.  But I was surprised you used .strip() instead
of [:-1] which is shorter and I would expect it to be more efficient
also.
  

___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602] windows console doesn't print or input Unicode

2011-03-24 Thread Glenn Linderman

Glenn Linderman  added the comment:

Presently, a correct application only needs to flush between a sequence of 
writes and a sequence of buffer.writes.

Don't assume the flush happens after every write, for a correct application.

--

___
Python tracker 
<http://bugs.python.org/issue1602>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602] windows console doesn't print or input Unicode

2011-03-24 Thread Glenn Linderman

Glenn Linderman  added the comment:

Would it suffice if the new scheme internally flushed after every buffer.write? 
 It wouldn't be needed after write, because the correct application would 
already do one there?

Am I off-base in supposing that the performance of buffer.write is expected to 
include a flush (because it isn't expected to be buffered)?

--

___
Python tracker 
<http://bugs.python.org/issue1602>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602] windows console doesn't print or input Unicode

2011-03-25 Thread Glenn Linderman

Glenn Linderman  added the comment:

David-Sarah said:
In any case, given that the buffer of the initial std{out,err} will always be a 
BufferedWriter object (since .buffer is readonly), it would be possible for the 
TextIOWriter to test a dirty flag in the BufferedWriter, in order to check 
efficiently whether the buffer needs flushing on each write. I've looked at the 
implementation complexity cost of this, and it doesn't seem too bad.

So if flush checks that bit, maybe TextIOWriter could just call buffer.flush, 
and it would be fast if clean and slow if dirty?  Calling it at the beginning 
of a Text level write, that is, which would let the char-at-a-time calls to 
buffer.write be fast.

And I totally agree with msg132191

--

___
Python tracker 
<http://bugs.python.org/issue1602>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602] windows console doesn't print or input Unicode

2011-03-26 Thread Glenn Linderman

Glenn Linderman  added the comment:

David-Sarah wrote:
Windows is very slow at scrolling a console, which might make the cost of 
flushing insignificant in comparison.)

Just for the record, I noticed a huge speedup in Windows console scrolling when 
I switched from WinXP to Win7 on a faster computer :)
How much is due to the XP->7 switch and how much to the faster computer, I 
cannot say, but it seemed much more significant than other speedups in other 
software.  The point?  Benchmark it on Win7, not XP.

--

___
Python tracker 
<http://bugs.python.org/issue1602>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11945] Adopt and document consistent semantics for handling NaN values in containers

2011-04-27 Thread Glenn Linderman

Glenn Linderman  added the comment:

Bertrand Meyer's exposition is flowery, and he is a learned man, but the basic 
argument he makes is:

Reflexivity of equality  is something that we expect for any data type, and it 
seems hard to justify that a value is not equal to itself. As to assignment, 
what good can it be if it does not make the target equal to the source value?  

The argument is flawed: now that NaN exists, and is not equal to itself in 
value, there should be, and need be, no expectation that assignment elsewhere 
should make the target equal to the source in value.  It can, and in Python, 
should, make them match in identity (is) but not in value (==, equality).

I laud the idea of adding to definition of reflexive equality to the glossary.  
However, I think it is presently a bug that a list containing a NaN value 
compares equal to itself.  Yes, such a list should have the same identity (is), 
but should not be equal.

--
nosy: +v+python

___
Python tracker 
<http://bugs.python.org/issue11945>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11945] Adopt and document consistent semantics for handling NaN values in containers

2011-04-28 Thread Glenn Linderman

Glenn Linderman  added the comment:

Nick says (and later explains better what he meant): 
The status quo works. Proposals to change it on theoretical grounds have a 
significantly higher bar to meet than proposals to simply document it clearly.

I say:
What the status quo doesn't provide is containers that "work".  In this case 
what I mean by "work" is that equality of containers is based on value, and 
value comparisons, and accept and embrace non-reflexive equality.  It might be 
possible to implement alternate containers with these characteristics, but that 
requires significantly more effort than simply filtering values.

Nonetheless, I totally agree with msg134654, and agree that properly 
documenting the present implementation would be a great service to users of the 
present implementation.

--

___
Python tracker 
<http://bugs.python.org/issue11945>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2010-11-20 Thread Glenn Linderman

Glenn Linderman  added the comment:

Regarding http://bugs.python.org/issue4953#msg91444 POST with 
multipart/form-data encoding can use UTF-8, other stuff is restricted to ASCII!

>From http://www.w3.org/TR/html401/interact/forms.html:
Note. The "get" method restricts form data set values to ASCII characters. Only 
the "post" method (with enctype="multipart/form-data") is specified to cover 
the entire [ISO10646] character set.

Hence cgi formdata can safely decode text fields using UTF-8 decoding 
(experimentally, that is the encoding used by Firefox to support the entire 
ISO10646 character set).

--
nosy: +v+python

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10479] cgitb.py should assume a binary stream for output

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

The CGI interface is a binary stream, because it is pumped directly to/from the 
HTTP protocol, which is a binary stream.

Hence, cgitb.py should produce binary output.  Presently, it produces text 
output.

When one sets stdout to a binary stream, and then cgitb intercepts an error, 
cgitb fails.

Demonstration of problem:

import sys
import traceback
sys.stdout = open("sob", "wb")  # WSGI sez data should be binary, so stdout 
should be binary???
import cgitb
sys.stdout.write(b"out")
fhb = open("fhb", "wb")
cgitb.enable()
fhb.write("abcdef")  # try writing non-binary to binary file.  Expect an error, 
of course.

--
components: Unicode
messages: 121865
nosy: v+python
priority: normal
severity: normal
status: open
title: cgitb.py should assume a binary stream for output
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10479>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10480] cgi.py should document the need for binary stdin/stdout

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

CGI is a bytestream protocol.  Python assumes a text mode encoding for stdin 
and stdout, this is inappropriate for the CGI interface.

CGI should provide an API to "do the right thing" to make stdin and stout 
binary mode interfaces (including mscvrt setting to binary on Windows).  
Failing that, it should document the need to do so in CGI applications.

Failing that, it should be documented somewhere, CGI seems the most appropriate 
place to me.

--
components: Library (Lib)
messages: 121868
nosy: v+python
priority: normal
severity: normal
status: open
title: cgi.py should document the need for binary stdin/stdout
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10481] subprocess PIPEs are byte streams

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

While http://bugs.python.org/issue2683 did clarify the fact that the 
.communicate API takes a byte stream as input, it is easy to miss the 
implication.  Because Python programs start up with stdin as a text stream, it 
might be good to point out that some action may need to be taken to be sure 
that the receiving program expects a byte stream, or that the byte stream 
supplied should be in an encoding that the receiving program is expecting and 
can decode appropriately.

No mention is presently made in the documentation for .communicate that its 
output is also a byte stream, and if text will correspond to whatever encoding 
is used by the sending program.

--
assignee: d...@python
components: Documentation
messages: 121869
nosy: d...@python, v+python
priority: normal
severity: normal
status: open
title: subprocess PIPEs are byte streams
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10481>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10481] subprocess PIPEs are byte streams

2010-11-20 Thread Glenn Linderman

Glenn Linderman  added the comment:

Maybe it should also be mentioned that p.stdout and p.stderr and p.stdin, when 
set to be PIPEs, are also byte streams.  Of course that is the reason that 
communicate accepts and produces byte streams.

--

___
Python tracker 
<http://bugs.python.org/issue10481>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10482] subprocess and deadlock avoidance

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

.communicate is a nice API for programs that produce little output, and can be 
buffered.  While that may cover a wide range of uses, it doesn't cover 
launching CGI programs, such as is done in http.server.  Now there are nice 
warnings about that issue in the http.server documentation.

However, while .communicate has the building blocks to provide more general 
solutions, it doesn't expose them to the user, nor does it separate them into 
building blocks, rather it is a monolith inside ._communicate.

For example, it would be nice to have an API that would "capture a stream using 
a thread" which could be used for either stdout or stdin, and is what 
._communicate does under the covers for both of them.

It would also be nice to have an API that would "pump a bytestream to .stdin as 
a background thread.  ._communicate doesn't provide that one, but uses the 
foreground thread for that.  And, it requires that it be fully buffered.  It 
would be most useful for http.server if this API could connect a file handle 
and an optional maximum read count to .stdin, yet do it in a background thread.

That would leave the foreground thread able to process stdout.  It is correct 
(but not what http.server presently does, but I'll be entering that enhancement 
request soon) for http.server to read the first line from the CGI program, 
transform it, add a few more headers, and send that to the browser, and then 
hook up .stdout to the browser (shutil.copyfileobj can be used for the rest of 
the stream).  However, there is no deadlock free way of achieving this sort of 
solution, capturing the stderr to be logged, not needing to buffer a 
potentially large file upload, and transforming the stdout, with the facilities 
currently provided by subprocess.  Breaking apart some of the existing building 
blocks, and adding an additional one for .stdin processing would allow a real 
http.server implementation, as well as being more general for other complex 
uses.

You see, for http.server, the stdin

--
components: Library (Lib)
messages: 121871
nosy: v+python
priority: normal
severity: normal
status: open
title: subprocess and deadlock avoidance
type: feature request
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10482>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10483] http.server - what is executable on Windows

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

The def executable for CGIHTTPRequestHandler is simply wrong on Windows.  The 
Unix executable bits do not apply.

Yet it is not clear what to use instead.  One could check the extension against 
PATHEXT, perhaps, but Windows doesn't limit itself to that except when not 
finding the exactly specified executable name.  Or one could require and borrow 
the unix #! convention.  As an experiment, since I'm mostly dealing the script 
files, I tried out a hack that implements two #! lines, the first for Unix and 
the second for Windows, and only consider something executable if the second 
line exists.  This fails miserably for .exe files, of course.

Another possibility would be to see if there is an association for the 
extension, but that rule would permit a Word document to be "executable" 
because there is a way to open it using MS Word.

Another possibility would be to declare a list of extensions in the server 
source, like the list of directories from which CGIs are found.

Another possibility would be to simply assume that anything found in the CGI 
directory is executable.

Another possibility is to require the .cgi extension only to be executable, but 
then it is hard to know how to run it.

Another possibility is to require two "extensions"... the "real" one for 
Windows, and then .cgi just before it.  So to make a program executable, it 
would be renamed from file.ext to file.cgi.ext

But the current technique is clearly insufficient.

--
components: Library (Lib)
messages: 121875
nosy: v+python
priority: normal
severity: normal
status: open
title: http.server - what is executable on Windows
type: behavior
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10483>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10484] http.server.is_cgi fails to handle CGI URLs containing PATH_INFO

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

is_cgi doesn't properly handle PATH_INFO parts of the path.  The Python2.x 
CGIHTTPServer.py had this right, but the introduction and use of 
_url_collapse_path_split broke it.

_url_collapse_path_split splits the URL into a two parts, the second part is 
guaranteed to be a single path component, and the first part is the rest.  
However, URLs such as

/cgi-bin/foo.exe/this/is/PATH_INFO/parameters

can and do want to exist, but the code in is_cgi will never properly detect 
that /cgi-bin/foo.exe is the appropriate executable, and the rest should be 
PATH_INFO.

This used to work correctly in the precedecessor CGIHTTPServer.py code in 
Python 2.6, so is a regression.

--
components: Library (Lib)
messages: 121876
nosy: v+python
priority: normal
severity: normal
status: open
title: http.server.is_cgi fails to handle CGI URLs containing PATH_INFO
type: behavior
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10484>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10485] http.server fails when query string contains addition '?' characters

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

http.server on Python 3 and CGIHTTPServer on Python 2 both contain the same 
code with the same bug.  In run_cgi, rest.rfind('?') is used to separate the 
path from the query string.  However, it should be rest.find('?') as the query 
string starts with '?' but may also contain '?'.  It is required that '?' not 
be used in URL path part without escaping.

Apache, for example, separates the following URL:

/testing?foo=bar?&baz=3

into path part /testing  and query string part  foo=bar?&baz=3 but http.server 
does not.

--
components: Library (Lib)
messages: 121877
nosy: v+python
priority: normal
severity: normal
status: open
title: http.server fails when query string contains addition '?' characters
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10485>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10485] http.server fails when query string contains addition '?' characters

2010-11-20 Thread Glenn Linderman

Changes by Glenn Linderman :


--
type:  -> behavior

___
Python tracker 
<http://bugs.python.org/issue10485>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10486] http.server doesn't set all CGI environment variables

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

HTTP_HOST HTTP_PORT REQUEST_URI are variables that my CGI scripts use, but 
which are not available from http.server or CGIHTTPServer (until I added them).

There may be more standard variables that are not set, I didn't attempt to 
enumerate the whole list.

--
components: Library (Lib)
messages: 121878
nosy: v+python
priority: normal
severity: normal
status: open
title: http.server doesn't set all CGI environment variables
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10487] http.server - doesn't process Status: header from CGI scripts

2010-11-20 Thread Glenn Linderman

New submission from Glenn Linderman :

While it is documented that http.server (and Python 2's CGIHTTPServer) do not 
process the status header, and limit the usefulness of CGI scripts as a result, 
that doesn't make it less of a bug, just a documented bug.  But I guess that it 
might have to be called a feature request; I'll not argue if someone switches 
this to feature request, but I consider it a bug.

See related issue 10482 for subprocess to provide better features for avoiding 
deadlock situations.  There seems to be no general way using subprocess to 
avoid possible deadlock situations.  However, since CGI doesn't really use 
stderr much, and only for logging, which the scripts can do themselves (the 
cgi.py module even provides for such), and because CGIs generally slurp stdin 
before creating stdout, it is possible to tweak sidestep use of 
subprocess.communicate, drop the stdout PIPE, and sequence the code to process 
stdin and then stdout, and not generally deadlock (some CGI scripts that don't 
above the stdin before stdout rule, might deadlock if called with POST and 
large inputs, but those are few).

By doing this, one can then add code to handle Status: headers, and avoid 
buffering large files on output (and on input).  The tradeoff is losing the 
stderr log; when that is hooked up, some error cases can trigger deadlocks by 
writing to stderr -- hence the subprocess issue mentioned above.

--
components: Library (Lib)
messages: 121881
nosy: v+python
priority: normal
severity: normal
status: open
title: http.server - doesn't process Status: header from CGI scripts
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2

___
Python tracker 
<http://bugs.python.org/issue10487>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10479] cgitb.py should assume a binary stream for output

2010-11-20 Thread Glenn Linderman

Changes by Glenn Linderman :


--
type:  -> behavior

___
Python tracker 
<http://bugs.python.org/issue10479>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10480] cgi.py should document the need for binary stdin/stdout

2010-11-20 Thread Glenn Linderman

Changes by Glenn Linderman :


--
assignee:  -> d...@python
components: +Documentation
nosy: +d...@python
type:  -> behavior

___
Python tracker 
<http://bugs.python.org/issue10480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10483] http.server - what is executable on Windows

2010-11-21 Thread Glenn Linderman

Glenn Linderman  added the comment:

Martin, that is an interesting viewpoint, and one I considered, but didn't 
state, because it seems much too restrictive.  Most CGI programs are written in 
scripting languages, not compiled to .exe.  So it seems the solution should 
allow for launching at least Perl and Python scripts, as well as .exe.  Whether 
subprocess.Popen can directly execute it, or whether it needs help from the 
registry or a #! line to get the execution going is just a matter of tweaking 
the coding for what gets passed to subprocess.Popen.  Declaring the definition 
based on what the existing code can already do is self-limiting.

Another possible interpretation of executable might be PATHEXT environment 
variable, but that is similar to declaring a list in the server source, which I 
did mention.  One might augment the other.

--

___
Python tracker 
<http://bugs.python.org/issue10483>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10483] http.server - what is executable on Windows

2010-11-22 Thread Glenn Linderman

Glenn Linderman  added the comment:

The rest of the code has clearly never had its deficiencies exposed on Windows, 
simply because executable() has prevented that.  So what the rest of the code 
"already supports" is basically nothing.  Reasonable Windows support is 
appropriate to implement as part of the bugfix.

You state that it isn't the function of http.server to extend Windows, however, 
even MS IIS has extended Windows to provide reasonable web scripting 
functionality, albeit it its own way, thus convicting the Windows facilities of 
being insufficient.  Attempting to use http.server to get a web testing 
environment running so that Python scripts can be tested locally requires some 
way of using an existing environment (except, of course, for "all new" web 
sites).  I suppose you would claim that using http.server for a web testing 
environment is an inappropriate use of http.server, also.  

Yet http.server on Unix appears to provide an adequate web testing environment: 
yes, some of that is because of Unix's #! feature.  This would certainly not be 
the first case where more code is required on Windows than Unix to implement 
reasonable functionality.

My desire for support for Perl is not an attempt to convince Python developers 
to use Perl instead of Python, but simply a reflection of the practicality of 
life: There are a lot of Perl CGI scripts used for pieces of Web servers.  
Reinventing them in Python may be fun, but can be more time consuming than 
projects may have the luxury to do.

Your claim that it already supports Python CGI scripts must be tempered by the 
documentation claim that it provides "altered semantics".  "altered semantics", 
as best as I can read in the code, is that the query string is passed to the 
Python script as a command line if it doesn't happen to contain an "=" sign.  
This is weird, unlikely to be found in a real web server, and hence somewhat 
useless for use as a test server also.

http.server has chosen to use subprocess which has chosen to use CreateProcess 
as its way of executing CGI.  There are other Windows facilities for executing 
programs, such as ShellExecute, but of course it takes the opposite tack: it 
can "execute" nearly any file, via registry-based associations.  Neither of 
these seem to be directly appropriate for use by http.server, the former being 
too restrictive without enhancements, the latter being too liberal in executing 
too many file types, although the requirement that CGI scripts live in specific 
directories may sufficiently rein in that liberality.

However, you have made me think through the process: it seems that an 
appropriate technique for Windows is to allow for a specific set of file 
extensions, and permit them to be executed using the registry-based association 
to do so.  However, for .cgi files, which depend heavily on the Unix #!, 
emulation of #! seems appropriate (and Windows doesn't seem to have an 
association for .cgi files either).

Your suggestion of making CGIHTTPRequestHandler easier to subclass is certainly 
a good one, and is almost imperative to implement to fix this bug in a useful 
manner without implementing an insufficient set of Windows extensions (for 
someone's definition of wrong).  There should be a way to sidestep the "altered 
semantics" for Python scripts (and Python scripts shouldn't have to be a 
special case, they should work with the general case), without replacing the 
whole run_cgi() function.  There should be a hook to define the list of 
executable extensions, and how to run them, and/or a hook to alter the command 
line passed to subprocess.Popen to achieve same.

So is_executable and is_python both seem to currently be replacable.  What is 
missing is a hook to implement cmdline creation before calling 
subprocess.Popen()  (besides the other reported bugs, of course)

--

___
Python tracker 
<http://bugs.python.org/issue10483>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10483] http.server - what is executable on Windows

2010-11-23 Thread Glenn Linderman

Glenn Linderman  added the comment:

Martin, you are splitting hairs about the "reported problem".  The original 
message does have a paragraph about the executable bits being wrong.  But the 
bulk of the message is commenting about the difficulty of figuring out what to 
replace it with.

So it looks like in spite of the hair splitting, we have iterated to a design 
of making run_cgi a bit friendlier in this regard.

I find it sufficient to define a method fully extracted from run_cgi as follows:

def make_cmdline( self, scriptfile, query ):
cmdline = [scriptfile]
if self.is_python(scriptfile):
interp = sys.executable
if interp.lower().endswith("w.exe"):
# On Windows, use python.exe, not pythonw.exe
interp = interp[:-5] + interp[-4:]
cmdline = [interp, '-u'] + cmdline
if '=' not in query:
cmdline.append(query)

This leaves run_cgi with:

import subprocess
cmdline = self.make_cmdline( scriptfile, query )
self.log_message("command: %s", subprocess.list2cmdline(cmdline))


Apologies: I don't know what format of patch is acceptable, but this is a 
simple cut-n-paste change.  I was sort of holding off until the hg conversion 
to figure out how to make code submissions, since otherwise I'd have to learn 
it twice in short order.

I have reimplemented my work-arounds in terms of the above fix, and they 
function correctly, so this fix would suffice for me, for this issue.  (N.B. 
I'm sure you've noticed that I have entered a number of issues for http.server; 
I hope that was the right way to do it, to attempt to separate the issues.)

--

___
Python tracker 
<http://bugs.python.org/issue10483>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10486] http.server doesn't set all CGI environment variables

2010-11-23 Thread Glenn Linderman

Glenn Linderman  added the comment:

Took a little more time to do a little more analysis on this one.  Compared a 
sample query via Apache on Linux vs http.server, then looked up the CGI RFC for 
more info:

DOCUMENT_ROOT: ...
GATEWAY_INTERFACE: CGI/1.1
HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.7
HTTP_ACCEPT_ENCODING: gzip,deflate
HTTP_ACCEPT_LANGUAGE: en-us,en;q=0.5
HTTP_CONNECTION: keep-alive
HTTP_COOKIE: ...
HTTP_HOST: ...
HTTP_KEEP_ALIVE: 115
HTTP_USER_AGENT: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.10) 
Gecko/20100914 Firefox/3.6.10
PATH: /usr/local/bin:/usr/bin:/bin
PATH_INFO: ...
PATH_TRANSLATED: ...
QUERY_STRING: 
REMOTE_ADDR: 173.75.100.22
REMOTE_PORT: 50478
REQUEST_METHOD: GET
REQUEST_URI: ...
SCRIPT_FILENAME: ...
SCRIPT_NAME: ...
SERVER_ADDR: ...
SERVER_ADMIN: ...
SERVER_NAME: ...
SERVER_PORT: ...
SERVER_PROTOCOL: HTTP/1.1
SERVER_SIGNATURE: Apache Server at rkivs.com Port 80

SERVER_SOFTWARE: Apache
UNIQUE_ID: TLEs8krc24oAABQ1TIUAAAPN

Above from Apache, below from http.server

GATEWAY_INTERFACE: CGI/1.1
HTTP_USER_AGENT: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) 
Gecko/20101026 Firefox/3.6.12
PATH_INFO: ...
PATH_TRANSLATED: ...
QUERY_STRING: ...
REMOTE_ADDR: 127.0.0.1
REQUEST_METHOD: GET
SCRIPT_NAME: ...
SERVER_NAME: ...
SERVER_PORT: ...
SERVER_PROTOCOL: HTTP/1.0
SERVER_SOFTWARE: SimpleHTTP/0.6 Python/3.2a4

Analysis of missing variables between Apache and http.server:

DOCUMENT_ROOT
HTTP_ACCEPT
HTTP_ACCEPT_CHARSET
HTTP_ACCEPT_ENCODING
HTTP_ACCEPT_LANGUAGE
HTTP_CONNECTION
HTTP_COOKIE
HTTP_HOST
HTTP_KEEP_ALIVE
HTTP_PORT
PATH
REQUEST_URI
SCRIPT_FILENAME
SERVER_ADDR
SERVER_ADMIN


Additional variables mentioned in RFC 3875, not used for my test requests:

AUTH_TYPE
CONTENT_LENGTH
CONTENT_TYPE
REMOTE_IDENT
REMOTE_USER

--

___
Python tracker 
<http://bugs.python.org/issue10486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10484] http.server.is_cgi fails to handle CGI URLs containing PATH_INFO

2010-11-23 Thread Glenn Linderman

Glenn Linderman  added the comment:

Here is a replacement for the body of is_cgi that will work with the current 
_url_collapse_path_split function, but it seems to me that it is ineffecient to 
do multiple splits and joins of the path between the two functions.

splitpath = server._url_collapse_path_split(self.path)
# more processing required due to possible PATHINFO parts
# not clear above function really does what is needed here,
# nor just how general it is!
splitpath = '/'.join( splitpath ).split('/', 2 )
head = '/' + splitpath[ 1 ]
tail = splitpath[ 2 ]
if head in self.cgi_directories:
self.cgi_info = head, tail
return True
return False

--

___
Python tracker 
<http://bugs.python.org/issue10484>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10482] subprocess and deadlock avoidance

2010-11-23 Thread Glenn Linderman

Glenn Linderman  added the comment:

So I've experimented a bit, and it looks like simply exposing ._readerthread as 
an external API would handle the buffered case for stdout or stderr.  For 
http.server CGI scripts, I think it is fine to buffer stderr, as it should not 
be a high-volume channel... but not both stderr and stdout, as stdout can be 
huge.  And not stdin, because it can be huge also.

For stdin, something like the following might work nicely for some cases, 
including http.server (with revisions):

def _writerthread(self, fhr, fhw, length):
while length > 0:
buf = fhr.read( min( 8196, length ))
fhw.write( buf )
length -= len( buf )
fhw.close()

When the stdin data is buffered, but the application wishes to be stdout 
centric instead of stdin centric (like the current ._communicate code), a 
variation could be made replacing fhr by a data buffer, and writing it 
gradually (or fully) to the pipe, but from a secondary thread.

Happily, this sort of code (the above is extracted from a test version of 
http.server) can be implemented in the server, but would be more usefully 
provided by subprocess, in my opinion.

To include the above code inside subprocess would just be a matter of tweaking 
references to class members instead of parameters.

--

___
Python tracker 
<http://bugs.python.org/issue10482>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10482] subprocess and deadlock avoidance

2010-12-01 Thread Glenn Linderman

Glenn Linderman  added the comment:

Here's an updated _writerthread idea that handles more cases:

def _writerthread(self, fhr, fhw, length=None):
if length is None:
flag = True
while flag:
buf = fhr.read( 512 )
fhw.write( buf )
if len( buf ) == 0:
flag = False
else:
while length > 0:
buf = fhr.read( min( 512, length ))
fhw.write( buf )
length -= len( buf )
# throw away additional data [see bug #427345]
while select.select([fhr._sock], [], [], 0)[0]:
if not fhr._sock.recv(1):
break
fhw.close()

--

___
Python tracker 
<http://bugs.python.org/issue10482>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10482] subprocess and deadlock avoidance

2010-12-01 Thread Glenn Linderman

Glenn Linderman  added the comment:

Sorry, left some extraneous code in the last message, here is the right code:

def _writerthread(self, fhr, fhw, length=None):
if length is None:
flag = True
while flag:
buf = fhr.read( 512 )
fhw.write( buf )
if len( buf ) == 0:
flag = False
else:
while length > 0:
buf = fhr.read( min( 512, length ))
fhw.write( buf )
length -= len( buf )
fhw.close()

--

___
Python tracker 
<http://bugs.python.org/issue10482>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10487] http.server - doesn't process Status: header from CGI scripts

2010-12-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

Just to mention, with the added code from issue 10482, I was able to get a 
3-stream functionality working great in http.server and also backported it to 
2.6 CGIHTTPServer... and to properly process the Status: header on stdout.

Works very well in 2.6; Issue 8077 prevents form processing from working in 
3.2a4, but otherwise it is working there also, and the experience in 2.6 
indicates that once issue 8077 is resolved, it should work in 3.2 also.

--

___
Python tracker 
<http://bugs.python.org/issue10487>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10482] subprocess and deadlock avoidance

2010-12-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

Looking at the code the way I've used it in my modified server.py:

stderr = []
stderr_thread = threading.Thread(target=self._readerthread,
 args=(p.stderr, stderr))
stderr_thread.daemon = True
stderr_thread.start()

self.log_message("writer: %s" % str( nbytes ))
stdin_thread = threading.Thread(target=self._writerthread,
args=(self.rfile, p.stdin, nbytes))
stdin_thread.daemon = True
stdin_thread.start()

and later

stderr_thread.join()
stdin_thread.join()

p.stderr.close()
p.stdout.close()

if stderr:
stderr = stderr[ 0 ].decode("UTF-8")

It seems like this sort of code (possibly passing in the encoding) could be 
bundled back inside subprocess (I borrowed it from there).

It also seems from recent discussion on npopdev that the cheat-sheet "how to 
replace" other sys and os popen functions would be better done as wrapper 
functions for the various cases.  Someone pointed out that the hard cases 
probably aren't cross-platform, but that currently the easy cases all get 
harder when using subprocess than when using the deprecated facilities.  They 
shouldn't.  The names may need to be a bit more verbose to separate the various 
use cases, but each use case should remain at least as simple as the prior 
function.

So perhaps instead of just  subprocess.PIPE  to select particular handling for 
stdin, stdout, and stderr, subprocess should implement some varieties to handle 
attaching  different types of reader and writer threads to the handles... of 
course, parameters need to come along for the ride too: maybe the the 
additional variations would be object references with parameters supplied, 
instead of just a manifest constant like .PIPE.

--

___
Python tracker 
<http://bugs.python.org/issue10482>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-02 Thread Glenn Linderman

Glenn Linderman  added the comment:

Pierre, thanks for your work on this.  I hope a fix can make it in to 3.2.

However, while starting Python with -u can help a but, that should not, in my 
opinion, be requirement to use CGI.  Rather, the stdin should be set into 
binary mode by the CGI processing... it would be helpful if the CGI module 
either did it automatically, verified it has been done, or at least provided a 
helper function that could do it, and that appropriate documentation be 
provided, if it is not automatic.  I've seen code like:

try: # Windows needs stdio set for binary mode.
import msvcrt
msvcrt.setmode (0, os.O_BINARY) # stdin  = 0
msvcrt.setmode (1, os.O_BINARY) # stdout = 1
msvcrt.setmode (2, os.O_BINARY) # stderr = 2
except ImportError:
pass

and

if hasattr( sys.stdin, 'buffer'):
sys.stdin = sys.stdin.buffer

which together, seem to do the job.  For output, I use a little class that 
accepts either binary or text, encoding the latter:

class IOMix():
def __init__( self, fh, encoding="UTF-8"):
if hasattr( fh, 'buffer'):
self._bio = fh.buffer
fh.flush()
self._last = 'b'
import io
self._txt = io.TextIOWrapper( self.bio, encoding, None, '\r\n')
self._encoding = encoding
else:
raise ValueError("not a buffered stream")
def write( self, param ):
if isinstance( param, str ):
self._last = 't'
self._txt.write( param )
else:
if self._last == 't':
self._txt.flush()
self._last = 'b'
self._bio.write( param )
def flush( self ):
self._txt.flush()
def close( self ):
self.flush()
self._txt.close()
self._bio.close()


sys.stdout = IOMix( sys.stdout, encoding )
sys.stderr = IOMix( sys.stderr, encoding )


IOMix may need a few more methods for general use, "print" comes to mind, for 
example.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-02 Thread Glenn Linderman

Glenn Linderman  added the comment:

Regarding the use of detach(), I don't know if it works.  Maybe it would.  I 
know my code works, because I have it working.  But if there are simpler 
solutions that are shown to work, that would be great.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-02 Thread Glenn Linderman

Glenn Linderman  added the comment:

Peter, it seems that detach is relatively new (3.1) likely the code samples and 
suggestions that I had found to cure the problem predate that.  While I haven't 
yet tried detach, your code doesn't seem to modify stdin, so are you 
suggesting, really...

   sys.stdin = sys.stdin.detach()

or maybe

   if hasattr( sys.stdin, 'detach'):
sys.stdin = sys.stdin.detach()

On the other hand, if detach, coded as above, is equivalent to 

   if hasattr( sys.stdin, 'buffer'):
sys.stdin = sys.stdin.buffer

then I wonder why it was added.  So maybe I'm missing something in reading the 
documentation you pointed at, and also that at 
http://docs.python.org/py3k/library/io.html#io.TextIOBase.detach
both of which seem to be well-documented if you already have an clear 
understanding of the layers in the IO subsystem, but perhaps not so 
well-documented if you don't yet (and I don't).

But then you referred to the platform-dependent stuff... I don't see anything 
in the documentation for detach() that implies that it also makes the 
adjustments needed on Windows to the C-runtime, which is what the 
platform-dependent stuff I suggested does... if it does, great, but a bit more 
documentation would help in understanding that.  And if it does, maybe that is 
the difference between the two code fragments in this comment?  I would have to 
experiment to find out, and am not in a position to do that this moment.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-02 Thread Glenn Linderman

Glenn Linderman  added the comment:

Rereading the doc link I pointed at, I guess detach() is part of the new API 
since 3.1, so doesn't need to be checked for in 3.1+ code... but instead, may 
need to be coded as:

try:
sys.stdin = sys.stdin.detach()
except UnsupportedOperation:
pass

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-03 Thread Glenn Linderman

Glenn Linderman  added the comment:

So then David, is your suggestion to use

sys.stdin = sys.stdin.detach()

and you claim that the Windows-specific hacks are not needed in 3.x land?  The 
are, in 2.x land, I have proven empirically, but haven't been able to test CGI 
forms very well in 3.x because of this bug.  I will test 3.x download without 
the Windows-specific hack, and report how it goes.  My testing started with 2.x 
and has proceeded to 3.x, and it is not always obvious what hacks are no longer 
needed in 3.x.  Thanks for the info.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-03 Thread Glenn Linderman

Glenn Linderman  added the comment:

David, Starting from a working (but hacked to work) version of http.server and 
using 3.2a1 (I should upgrade to the Beta, but I doubt it makes a difference at 
the moment), I modified

# if hasattr( sys.stdin, 'buffer'):
# sys.stdin = sys.stdin.buffer
sys.stdin = sys.stdin.detach()

and it all kept working.

Then I took out the

try: # Windows needs stdio set for binary mode.
import msvcrt
msvcrt.setmode (0, os.O_BINARY) # stdin  = 0
msvcrt.setmode (1, os.O_BINARY) # stdout = 1
msvcrt.setmode (2, os.O_BINARY) # stderr = 2
except ImportError:
pass

and it quit working.  Seems that \r\r\n\r\r\n is not recognized by Firefox as 
the "end of the headers" delimiter.

Whether this is a bug in IO or not, I can't say for sure.  It does seem, 
though, that

1) If Python is fully replacing the IO layers, which in 3.x it seems to claim 
to, then it should fully replace them, building on a binary byte stream, not a 
"binary byte stream with replacement of \n by \r\n".  The Windows hack above 
replaces, for stdin, stdout, and stderr, a "binary byte stream with replacement 
of \n by \r\n" with a binary byte stream.  Seems like Python should do that, on 
Windows, so that it has a chance of actually knowing/controlling what gets 
generated.  Perhaps it does, if started with "-u", but starting with "-u" 
should not be a requirement for a properly functioning program. Alternately, 
the IO streams could understand, and toggle the os.O_BINARY flag, but that 
seems like it would require more platform-specific code than simply opening all 
Windows files (and adjusting preopened Windows files) during initialization.

2) The weird CGI processing that exists in the released version of http.server 
seems to cover up this problem, partly because it isn't very functional, claims 
"alternate semantics" (read: non-standard semantics), and invokes Python with 
-u when it does do so.  It is so non-standard that it isn't clear what should 
or should not be happening.  But the CGI scripts I am running, that pass or 
fail as above, also run on Windows 2.6, and particularly, Unix 2.6, in an 
Apache environment.  So I have been trying to minimize the differences to 
startup code, rather than add platform-specific tweaks throughout the CGI 
scripts.

That said, it clearly could be my environment, but I've debugged enough 
different versions of things to think that the Windows hack above is required 
on both 2.x and 3.x to ensure proper bytestreams and others must think so 
too, because I found the code by searching on Google, not because I learned 
enough Python internals to figure it out on my own.  The question I'm 
attempting to address here, is only that 3.x still needs the same hack that 2.x 
needs, on Windows, to create bytestreams.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-03 Thread Glenn Linderman

Glenn Linderman  added the comment:

(and I should mention that all the "hacked to work" issues in my copy of 
http.server have been reported as bugs, on 2010-11-21. The ones of most 
interest related to this binary bytestream stuff are issue 10479 and issue 
10480)

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-04 Thread Glenn Linderman

Glenn Linderman  added the comment:

R. David said:
>From looking over the cgi code it is not clear to me whether Pierre's approach 
>is simpler or more complex than the alternative approach of starting with 
>binary input and decoding as appropriate.  From a consistency perspective I 
>would prefer the latter, but I don't know if I'll have time to try it out 
>before rc1.

I say:
I agree with R. David that an approach using the binary input seems more 
appropriate, as the HTTP byte stream is defined as binary.  Do the 3.2 beta 
email docs now include documentation for the binary input interfaces required 
to code that solution?  Or could you provide appropriate guidance and review, 
should someone endeavor to attempt such a solution?

The remaining concerns below are only concerns; they may be totally irrelevant, 
and I'm too ignorant of how the code works to realize their irrelevance.  
Hopefully someone that understands the code can comment and explain.

I believe that the proper solution is to make cgi work if sys.stdin has already 
been converted to be a binary stream, or if it hasn't, to dive down to the 
underlying binary stream, using detach().  Then the data should be processed as 
binary, and decoded once, when the proper decoding parameters are known.  The 
default encoding seems to be different on different platforms, but the binary 
stream is standardized.  It looks like new code was added to attempt to 
preprocess the MIME data into chunks to be fed to the email parser, and while I 
can believe code could be written to do such correctly (but I can't speak for 
whether this patch code is correct or not), it seems redundant/inefficient and 
error-prone to do it once outside the email parser, and again inside it.

I also doubt that self.fp.encoding is consistent from platform to platform).  
But the HTTP bytestream is binary, and self-describing or declared by HTTP or 
HTML standards for the parts that are not self-describing.  The default 
platform encoding used for the preopened sys.stdin is not particularly relevant 
and may introduce mojibake type bugs, decoding errors in the presence of some 
inputs, and/or platform inconsistencies, and it seems that that is generally 
where self.fp.encoding, used in various places in this patch, comes from.

Regarding the binary vs. text issue; when using both binary and text interfaces 
on output streams, there is the need to do flushing between text and binary 
writes to preserve the proper sequencing of data in the output.  For input, is 
it possible that mixing text and binary input could result in the binary input 
missing data that has already been preloaded into the text buffer?  Although, 
for CGI programs, no one should have done any text inputs before calling the 
CGI functions, so perhaps this is also not a concern... and there probably 
isn't any buffering on socket streams (the usual CGI use case) but I see the 
use of both binary and text input functions in this patch, so this may be 
another issue that someone could explain why such a mix is or isn't a problem.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-05 Thread Glenn Linderman

Glenn Linderman  added the comment:

R. David said:
(I believe http uses latin-1 when no charset is specified, but I need to double 
check that)

See http://bugs.python.org/issue4953#msg121864 ASCII and UTF-8 are what HTTP 
defines. Some implementations may, in fact, use latin-1 instead of ASCII in 
some places.  Not sure if we want Python CGI to do that or not.

Thanks for getting the email APIs in the docs... shouldn't have to bug you as 
much that way :)

Antoine said:
(this is all funny in the light of the web-sig discussion where people explain 
that CGI is such a natural model)

Thanks for clarifying the stdin buffering vs. binary issue... it is as I 
suspected.  Maybe you can also explain the circumstances in which "my" Windows 
code is needed, and whether Python's "-u" does it automatically, but I still 
believe that "-u" shouldn't be necessary for a properly functioning program, 
not even a CGI program... it seems like a hack to allow some programs to work 
without other changes, so might be a useful feature, but hopefully not a 
required part of invoking a CGI program.

The CGI interface is "self describing", when you follow the standards, and use 
the proper decoding for the proper pieces.  In that way, it is similar to 
email.  It is certainly not as simple as using UTF-8 everywhere, but 
compatibility with things invented before UTF-8 even existed somewhat prevents 
the simplest solution, and then not everything is text, either.  At least it is 
documented, and permits full UNICODE data to be passed around where needed, and 
permits binary to be passed around where that is needed, when the specs are 
adhered to.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-05 Thread Glenn Linderman

New submission from Glenn Linderman :

Per Antoine's request, I wrote this test code, it isn't elegant, I whipped it 
together quickly; but it shows the issue.  The issue may be one of my 
ignorance, but it does show the behavior I described in issue 4953. Here's the 
output from the various test parameters that might be useful in running the 
test.

>c:\python32\python.exe test.py test 1
['c:\\python32\\python.exe', 'test.py', '1']
All OK

>c:\python32\python.exe test.py test 2
['c:\\python32\\python.exe', 'test.py', '2']
Not OK: b'abc\r\r\ndef\r\r\n'

>c:\python32\python.exe test.py test 3
['c:\\python32\\python.exe', 'test.py', '3']
All OK

>c:\python32\python.exe test.py test 4
['c:\\python32\\python.exe', 'test.py', '4']
Not OK: b'abc\r\r\ndef\r\r\n'

>c:\python32\python.exe test.py test 1-u
['c:\\python32\\python.exe', '-u', 'test.py', '1-u']
All OK

>c:\python32\python.exe test.py test 2-u
['c:\\python32\\python.exe', '-u', 'test.py', '2-u']
All OK

>c:\python32\python.exe test.py test 3-u
['c:\\python32\\python.exe', '-u', 'test.py', '3-u']
All OK

>c:\python32\python.exe test.py test 4-u
['c:\\python32\\python.exe', '-u', 'test.py', '4-u']
All OK

>

Note that test 2 and 4, which do not use the mscvrt stuff, have double \r: one 
sent by the code, and another added, apparently by MSC newline processing.  
test 2-u and 4-u, which are invoking the subprocess with Python's -u parameter, 
also do not exhibit the problem, even though the mscvrt stuff is not used.  
This seems to indicate that Python's -u parameter does approximately the same 
thing as my windows_binary function.

Seems like if Python already has code for this, that it would be nice to either 
make it more easily available to the user as an API (like my windows_binary 
function, invoked with a single line) in the io or sys modules (since it is 
used to affect sys.std* files).

And it would be nice if the function "worked cross-platform", even if it is a 
noop on most platforms.

--
files: test.py
messages: 125500
nosy: v+python
priority: normal
severity: normal
status: open
title: binary stdio
Added file: http://bugs.python.org/file20285/test.py

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-05 Thread Glenn Linderman

Glenn Linderman  added the comment:

Pierre said:
In all cases the interpreter must be launched with the -u option. As stated in 
the documentation, the effect of this option is to "force the binary layer of 
the stdin, stdout and stderr streams (which is available as their buffer 
attribute) to be unbuffered. The text I/O layer will still be line-buffered.". 
On my PC (Windows XP) this is required to be able to read all the data stream ; 
otherwise, only the beginning is read. I tried Glenn's suggestion with mscvrt, 
with no effect

I say:
If you start the interpreter with -u, then my mscvrt has no effect.  Without 
it, there is an effect.  Read on...

Antoine said:
Could you open a separate bug with a simple piece of code to reproduce
the issue (preferably without launching an HTTP server :))?

I say:
issue 10841

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-05 Thread Glenn Linderman

Glenn Linderman  added the comment:

tested on Windows, for those that aren't following issue 4953

--
components: +IO
type:  -> behavior

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-05 Thread Glenn Linderman

Glenn Linderman  added the comment:

The same.  This can be tested with the same test program,

c:\python32\python.exe test.py 1 > test1.txt

similar for 2, 3, 4.  Then add -u and repeat.  All 8 cases produce the same 
results, either via a pipe, or with a redirected stdout.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-05 Thread Glenn Linderman

Glenn Linderman  added the comment:

Actually, it seems like this "-u" behaviour, should simply be the default for 
Python 3.x on Windows.  The new IO subsystem seems to be able to add \r when 
desired anyway.  And except for Notepad, most programs on Windows can deal with 
\r\n or solo \n anyway.  \r\r\n doesn't cause too many problems for very many 
programs, but is (1) non-standard (2) wasteful of bytes (3) does cause problems 
for CGI programs, and likely some others... I haven't done a lot of testing 
with that case, but tried a few programs, and they dealt with it gracefully.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-05 Thread Glenn Linderman

Glenn Linderman  added the comment:

Is there an easy way for me to find the code for -u?  I haven't learned my way 
around the Python sources much, just peeked in modules that I've needed to fix 
or learn something from a little.  I'm just surprised you think it is 
orthogonal, but I'm glad you agree it is a bug.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

I can read and understand C well enough, having coded in it for about 40 years 
now... but I left C for Perl and Perl for Python, I try not to code in C when I 
don't have to, these days, as the P languages are more productive, overall.

But there has to be special handling somewhere for opening std*, because they 
are already open, unlike other files.  That is no doubt where the bug is.  Can 
you point me at that code?

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

I suppose the FileIO in _io is next to look at, wherever it can be found.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

Found it.

The file browser doesn't tell what line number it is, but in _io/Fileio.c 
function fileio_init, there is code like

#ifdef O_BINARY
flags |= O_BINARY;
#endif

#ifdef O_APPEND
if (append)
flags |= O_APPEND;
#endif

if (fd >= 0) {
if (check_fd(fd))
goto error;
self->fd = fd;
self->closefd = closefd;
}


Note that if O_BINARY is defined, it is set into the default flags for opening 
files by name.  But if "opening" a file by fd, the fd is copied, regardless of 
whether it has O_BINARY set or not.  The rest of the IO code no doubt assumes 
the file was opened in O_BINARY mode.  But that isn't true of MSC std* handles 
by default.

How -u masks or overcomes this problem is not obvious, as yet, but the root bug 
seems to be the assumption in the above code.  A setmode of O_BINARY should be 
done, probably #ifdef O_BINARY, when attaching a MS C fd to a Python IO stack.  
Otherwise it is going to have \r\r\n problems, it would seem.

Alternately, in the location where the Python IO stacks are attached to std* 
handles, those specific std* handles should have the setmode done there... 
other handles, if opened by Python, likely already have it done.

Documentation for open should mention, in the description of the file 
parameter, that on Windows, it is important to only attach Python IO stack to 
O_BINARY files, or beware the consequences of two independent newline handling 
algorithms being applied to the data stream... or to document that setmode 
O_BINARY will be performed on the handles passed to open.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

Etienne, I'm not sure what you are _really_ referring to by 
HTTP_TRANSFER_ENCODING.  There is a TRANSFER_ENCODING defined by HTTP but it is 
completely orthogonal to character encoding issues.  There is a 
CONTENT_ENCODING defined which is a character encoding, but that is either 
explicit in the MIME data, or assumed to be either ASCII or UTF-8, in certain 
form data contexts.

Because the HTTP protocol is binary, only selected data, either explicitly or 
implicitly (by standard definition) should be decoded, using the appropriate 
encoding.  FieldStorage should be able to (1) read a binary stream (2) do the 
appropriate decoding operations (3) return the data as bytes or str as 
appropriate.

Right now, I'm mostly interested in the fact that it doesn't do (1), so it is 
hard to know what it does for (2) or (3) because it gets an error first.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

Don't find "initstdio" "stdio" in pythonrun.c.  Has it moved?  There are 
precious few references to stdin, stdout, stderr in that module, mostly for 
attaching the default encoding.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

stderr is notable by its absence in the list of O_BINARY adjustments.

So -u does do 2/3 of what my windows_binary() does :)  Should I switch my test 
case to use stderr to demonstrate that it doesn't help with that?  I vaguely 
remember that early versions of DOS didn't do stderr, but I thought by the time 
Windows came along, let's see, was that about 1983?, that stderr was codified 
for DOS/Windows.  For sure it has never been missing in WinNT 4.0 +.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

Etienne said:
yes, lets not complexify anymore please...

Albert Einstein said:
Things should be as simple as possible, but no simpler.

I say:
My "learning" of HTTP predates "chunked".  I've mostly heard of it being used 
in downloads rather than uploads, but I'm not sure if it pertains to uploads or 
not.  Since all the data transfer is effectively chunked by TCP/IP into 
packets, I'm not clear on what the benefit is, but I am pretty sure it is 
off-topic for this bug, at least until FieldStorage works at all on 3.x, like 
for small pieces of data.

I meant to say in my preceding response, that the multiple encodings that may 
be found in an HTTP stream, make it inappropriate to assign an encoding to the 
file through which the HTTP data streams... that explicit decode calls by 
FieldStorage should take place on appropriate chunks only.  I almost got there, 
so maybe you picked it up.  But I didn't quite say it.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

Makes sense to me.  Still should document the open file parameter when passed 
an fd, and either tell the user that it should be O_BINARY, or that it will be 
O_BINARYd for them, whichever technique is chosen.  But having two newline 
techniques is bad, and if Python thinks it is tracking the file pointer, but 
Windows is doing newline translation for it, then it isn't likely tracking it 
correctly for random access IO.  So I think the choice should be that any fd 
passed in to open on Windows should get O_BINARYd immediately.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

Victor,  Thanks for your interest and patches.

msg125530 points out the location of the code where _all_ fds could be 
O_BINARYed, when passed in to open.  I think this would make all fds open in 
binary mode, per Guido's comment... he made exactly the comment I was hoping 
for, even though I didn't +nosy him... I believe this would catch std* along 
the way, and render your first patch unnecessary, but your second one would 
likely still be needed.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-06 Thread Glenn Linderman

Glenn Linderman  added the comment:

We have several, myself included, that can't use CGI under 3.x because it 
doesn't take a binary stream.

I believe there are several alternatives:
1) Document that CGI needs a binary stream, and expect the user to provide it, 
either an explicit handle, or by tweaking sys.stdin before calling with a 
default file stream.
2) Provide a CGI function for tweaking sys.stdin (along with #1)
3) Document that CGI will attempt to convert passed in streams, default or 
explicit, to binary, if they aren't already, and implement the code to do so.

My choice is #3.  I see CGI as being used only in HTTP environments, where the 
data stream should be binary anyway.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-07 Thread Glenn Linderman

Glenn Linderman  added the comment:

Pierre said:
Option 1 is impossible, because the CGI script sometimes has no control on the 
stream : for instance on a shared web host, it will receive sys.stdin as a text 
stream

I say:
It is the user code of the CGI script that calls CGI.FieldStorage.  So the user 
could be required (option 1) to first tweak the stdin to be bytes, one way or 
another.  I don't understand any circumstance where a Python CGI script doesn't 
have control over the settings of the Python IO Stack that it is using to 
obtain the data... and the CGI spec is defined as a bytestream, so it must be 
able to read the bytes.

Victor said:
It is possible to test the type of the stream.

I say:
Yes, why just assume (as I have been) that the initial precondition is the 
defaults that Python imposes.  Other code could have interposed something else. 
 The user should be allowed to pass in anything that is a TextIOWrapper, or a 
BytesIO, and CGI should be able to deal with it.  If the user passes some other 
type, it should be assumed to produce bytes from its read() API, and if it 
doesn't the user gets what he deserves (an error).  Since the default Python 
sys.stdin is a TextIOWrapper, having CGI detect that, and extract its .buffer 
to use for obtaining bytes, should work fine.  If the user already tweaked 
sys.stdin to be a BytesIO (.buffer or detach()), CGI should detect and use 
that.  If the user substitutes a different class, it should be bytes, and that 
should be documented, the three cases that could work.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10841] binary stdio

2011-01-07 Thread Glenn Linderman

Glenn Linderman  added the comment:

Thanks for your work on this Victor, and other commenters also.

--

___
Python tracker 
<http://bugs.python.org/issue10841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602] windows console doesn't print utf8 (Py30a2)

2011-01-08 Thread Glenn Linderman

Glenn Linderman  added the comment:

Interesting!

I was able to tweak David-Sarah's code to work with Python 3.x, mostly doing 
things that 2to3 would probably do: changing  unicode() to str(), dropping u 
from u'...', etc.

I skipped the unmangling of command-line arguments, because it produced an 
error I didn't understand, about needing a buffer protocol.  But I'll attach 
David-Sarah's code + tweaks + a test case showing output of the Cyrillic 
alphabet to a console with code page 437 (at least, on my Win7-64 box, that is 
what it is).

Nice work, David-Sarah.  I'm quite sure this is not in a form usable inside 
Python 3, but it shows exactly what could be done inside Python 3 to make 
things work... and gives us a workaround if Python 3 is not fixed.

--
Added file: http://bugs.python.org/file20320/unicode2.py

___
Python tracker 
<http://bugs.python.org/issue1602>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602] windows console doesn't print utf8 (Py30a2)

2011-01-09 Thread Glenn Linderman

Glenn Linderman  added the comment:

I would certainly be delighted if someone would reopen this issue, and figure 
out how to translate unicode2.py to Python internals so that Python's console 
I/O on Windows would support Unicode "out of the box".

Otherwise, I'll have to include the equivalent of unicode2.py in all my Python 
programs, because right now, I'm including instructions for the use to (1) 
choose Lucida or Consolas font if they can't figure out any other font that 
gets rid of the square boxes (2) chcp 65001 (3) set PYTHONIOENCODING=UTF-8

Having this capability inside Python (or my programs) will enable me to 
eliminate two-thirds of the geeky instructions for my users.  But it seems like 
a very appropriate capability to have within Python, especially Python 3.x with 
its preference and support Unicode in so many other ways.

--

___
Python tracker 
<http://bugs.python.org/issue1602>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10879] cgi memory usage

2011-01-10 Thread Glenn Linderman

New submission from Glenn Linderman :

In attempting to review issue 4953, I discovered a conundrum in handling of 
multipart/formdata.

cgi.py has claimed for some time (at least since 2.4) that it "handles" file 
storage for uploading large files.  I looked at the code in 2.6 that handles 
such, and it uses the rfc822.Message method, which parses headers from any 
object supporting readline().  In particular, it doesn't attempt to read 
message bodies, and there is code in cgi.py to perform that.

There is still code in 3.2 cgi.py to read message bodies, but... rfc822 has 
gone away, and been replaced with the email package.  Theoretically this is 
good, but the cgi FieldStorage read_multi method now parses the whole CGI input 
and then iteration parcels out items to FieldStorage instances.  There is a 
significant difference here: email reads everything into memory (if I 
understand it correctly).  That will never work to upload large or many files 
when combined with a Web server that launches CGI programs with memory limits.

I see several possible actions that could be taken:
1) Documentation.  While it is doubtful that any is using 3.x CGI, and this 
makes it more doubtful, the present code does not match the documentation, 
because while the documenteation claims to handle file uploads as files, rather 
than in-memory blobs, the current code does not do that.

2) If there is a method in the email package that corresponds to 
rfc822.Message, parsing only headers, I couldn't find it.  Perhaps it is 
possible to feed just headers to BytesFeedParser, and stop, and get the same 
sort of effect.  However, this is not the way the cgi.py presently is coded.  
And if there is a better API, for parsing only headers, that is or could be 
exposed by email, that might be handy.

3) The 2.6 cgi.py does not claim to support nested multipart/ stuff, only one 
level.  I'm not sure if any present or planned web browsers use nested 
multipart/ stuff... I guess it would require a nested  tag? which is 
illegal HTML last I checked.  So perhaps the general logic flow of 2.6 cgi.py 
could be reinstated, with a technique to feed only headers to BytesFeedParser, 
together with reinstating the MIME body parsing in cgi.py,b and this could make 
a solution that works.

I discovered this, beacuase I couldn't figure out where a bunch of the methods 
in cgi.py were called from, particularly read_lines_to_outerboundary, and 
make_file.  They seemed to be called much too late in the process.  It wasn't 
until I looked back at 2.6 code that I could see that there was a transition 
from using rfc822 only for headers to using email for parsing the whole data 
stream, and that that was the cause of the documentation not seeming to match 
the code logic.  I have no idea if this problem is in 2.7, as I don't have it 
installed here for easy reference, and I'm personally much more interested in 
3.2.

--
components: Library (Lib)
messages: 125884
nosy: r.david.murray, v+python
priority: normal
severity: normal
status: open
title: cgi memory usage
versions: Python 3.1, Python 3.2, Python 3.3

___
Python tracker 
<http://bugs.python.org/issue10879>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

This looks much simpler than the previous patch.  However, I think it can be 
further simplified. This is my first reading of this code, however, so I might 
be totally missing something(s).

Pierre said:
Besides FieldStorage, I modified the  parse() function at module level, but not 
parse_multipart (should it be kept at all ?)

I say:
Since none of this stuff works correctly in 3.x, and since there are comments 
in the code about "folding" the parse* functions into FieldStorage, then I 
think they could be deprecated, and not fixed.  If people are still using them, 
by writing code to work around their deficiencies, that code would continue to 
work for 3.2, but then not in 3.3 when that code is removed?  That seems 
reasonable to me.  In this scenario, the few parse* functions that are used by 
FieldStorage should be copied into FieldStorage as methods (possibly private 
methods), and fixed there, instead of being fixed in place.  That was all the 
parse* functions could be deprecated, and the use of them would be unchanged 
for 3.2.

Since RFC 2616 says that the HTTP protocol uses ISO-8859-1 (latin-1), I think 
that should be required here, instead of deferring to fp.encoding, which would 
eliminate 3 lines.

Also, the use of FeedParser could be replaced by BytesFeedParser, thus 
eliminating the need to decode header lines in that loop.

And, since this patch will be applied only to Python 3.2+, the mscvrt code can 
be removed (you might want a personal copy with it for earlier version of 
Python 3.x, of course).

I wonder if the 'ascii' reference should also be 'latin-1'?

In truly reading and trying to understand this code to do a review, I notice a 
deficiency in _parseparam and parse_header: should I file new issues for them? 
(perhaps these are unimportant in practice; I haven't seen \ escapes used in 
HTTP headers).  RFC 2616 allows for "" which are handled in _parseparam.  And 
for \c inside "", which is handled in parse_header.  But: _parseparam counts " 
without concern for \", and parse_header allows for \\ and \" but not \f or \j 
or \ followed by other characters, even though they are permitted (but probably 
not needed for much).

In make_file, shouldn't the encoding and newline parameters be preserved when 
opening text files?  On the other hand, it seems like perhaps we should 
leverage the power of IO to do our encoding/decoding... open the file with the 
TextIOBase layer set to the encoding for the MIME part, but then just read 
binary without decoding it, write it to the .buffer of the TextIOBase, and when 
the end is reached, flush it, and seek(0).  Then the data can be read back from 
the TextIOBase layer, and it will be appropriate decoded.  Decoding errors 
might be deferred, but will still occur.  This technique would save two data 
operations: the explicit decode in the cgi code, and the implicit encode in the 
IO layers, so resources would be saved.  Additionally, if there is a 
CONTENT-LENGTH specified for non-binary data, the read_binary method should be 
used for it also, because it is much more efficient than readlines... less 
scanning of the data, and fewer outer iterations.  This goes well with 
 the technique of leaving that data in binary until read from the file.

It seems that in addition to fixing this bug, you are also trying to limit the 
bytes read by FieldStorage to some maximum (CONTENT_LENGTH).  This is good, I 
guess.  But skip_lines() has a readline potentially as long as 32KB, that isn't 
limited by the maximum.  Similar in read_lines_to_outer_boundary, and 
read_lines_to_eof (although that may not get called in the cases that need to 
be limited).  If a limit is to be checked for, I think it should be a true, 
exact limit, not an approximate limit.

See also issue 10879.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

Also, the required behavior of make_file changes, to need the right encoding, 
or binary, so that needs to be documented as a change for people porting from 
2.x. It would be possible, even for files, which will be uploaded as binary, 
for a user to know the appropriate encoding and, if the file is to be processed 
rather than saved, supply that encoding for the temporary file.  So the 
temporary file may not want to be assumed to be binary, even though we want to 
write binary to it.  So similarly to the input stream, if it is TextIOBase, we 
want to write to the .buffer.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10879] cgi memory usage

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

Trying to code some of this, it would be handy if BytesFeedParser.feed would 
return a status, indicating if it has seen the end of the headers yet. But that 
would only work if it is parsing as it goes, rather than just buffering, with 
all the real parsing work being done at .close time.

--

___
Python tracker 
<http://bugs.python.org/issue10879>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

I wrote:
Additionally, if there is a CONTENT-LENGTH specified for non-binary data, the 
read_binary method should be used for it also, because it is much more 
efficient than readlines... less scanning of the data, and fewer outer 
iterations.  This goes well with the technique of leaving that data in binary 
until read from the file.

I further elucidate:
Sadly, while the browser (Firefox) seems to calculate an overall CONTENT-LENGTH 
for the HTTP headers, it does not seem to calculate CONTENT-LENGTH for 
individual parts, not even file parts where it would be extremely helpful.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

It seems the choice of whether to make_file or StringIO is based on the 
existence of self.length... per my previous comment, content-length doesn't 
seem to appear in any of the multipart/ item headers, so it is unlikely that 
real files will be created by this code.

Sadly that seems to be the case for 2.x also, so I wonder now if CGI has ever 
properly saved files, instead of buffering in memory...

I'm basing this off the use of Firefox Live HTTP headers tool.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

Victor said:
Don't you think that a warning would be appropriate if sys.stdin is passed 
here?
---
# self.fp.read() must return bytes
if isinstance(fp,TextIOBase):
self.fp = fp.buffer
else:
self.fp = fp
---
Maybe a DeprecationWarning if we would like to drop support of TextIOWrapper 
later :-)

I say:
I doubt we ever want to Deprecate the use of "plain stdin" as the default (or 
as an explicit) parameter for FieldStorage's fp parameter.  Most usage of 
FieldStorage will want to use stdin; if FieldStorage detects that stdin is 
TextIOBase (generally it is) and uses its buffer to get binary data, that is 
very convenient for the typical CGI application.  I think I agree with the rest 
of your comments.

Etienne said:
is sendfile() available on Windows ? i thought the Apache server could
use that to upload files without having to buffer files in memory..

I say:
I don't think it is called that, but similar functionality may be available on 
Windows under another name.  I don't know if Apache uses it or not.  But I have 
no idea how FieldStorage could interact with Apache via the CGI interface, to 
access such features.  I'm unaware of any APIs Apache provides for that 
purpose, but if there are some, let me know.  On the other hand, there are 
other HTTP servers besides Apache to think about. 

I'm also not sure if sendfile() or equivalent, is possible to use from within 
FieldStorage, because it seems in practice we don't know the size of the 
uploaded file without parsing it (which requires buffering it in memory to look 
at it).

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10879] cgi memory usage

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

R. David said:
However, I'm not clear on how that helps.  Doesn't FieldStorage also load 
everything into memory?

I say:
FieldStorage in 2.x (for x <= 6, at least) copies incoming file data to a file, 
using limited size read/write operations.  Non-file data is buffered in memory.

In 3.x, FieldStorage doesn't work.  The code that is there, though, for 
multipart/ data, would call email to do all the parsing, which would happen to 
include file data, which always comes in as part of a multipart/ data stream.  
This would prevent cgi from being used to accept large files in a limited 
environment.  Sadly, there is code is place that would the copy the memory 
buffers to files, and act like they were buffered... but process limits do not 
care that the memory usage is only temporary...

--

___
Python tracker 
<http://bugs.python.org/issue10879>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

Victor said:
"Set sys.stdin.buffer.encoding attribute is not a good idea. Why do you modify 
fp, instead of using a separated attribute on FieldStorage (eg. 
self.fp_encoding)?"

Pierre said:
I set an attribute encoding to self.fp because, for each part of a 
multipart/form-data, a new instance of FieldStorage is created, and this 
instance needs to know how to decode bytes. So, either an attribute must be set 
to one of the arguments of the FieldStorage constructor, and fp comes to mind, 
or an extra argument has to be passed to this constructor, i.e. the encoding of 
the original stream

I say:
Ah, now I understand why you did it that way, but:

The RFC 2616 says the CGI stream is ISO-8859-1 (or latin-1).  The _defined_ 
encoding of the original stream is irrelevant, in the same manner that if it is 
a text stream, that is irrelevant.  The stream is binary, and latin-1, or it is 
non-standard.  Hence, there is not any reason to need a parameter, just use 
latin-1. If non-standard streams are to be supported, I suppose that would 
require a parameter, but I see no need to support non-standard streams: it is 
hard enough to support standard streams without complicating things.  The 
encoding provided with stdin is reasonably unlikely to be latin-1: Linux 
defaults to UTF-8 (at least on many distributions), and Windows to CP437, and 
in either case is configurable by the sysadmin.  But even the sysadmin should 
not be expected to configure the system locale to have latin-1 as the default 
encoding for the system, just because one of the applications that might run is 
an CGI program.  So I posit that the encoding on fp is irrelevan
 t and should be ignored, and using it as a parameter between FieldStorage 
instances is neither appropriate nor necessary, as the standard defines latin-1 
as the encoding for the stream.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-10 Thread Glenn Linderman

Glenn Linderman  added the comment:

Victor said:
I mean: you should pass sys.stdin.buffer instead of sys.stdin.

I say:
That would be possible, but it is hard to leave it at default, in that case, 
because sys.stdin will, by default, not be a binary stream.  It is a 
convenience for FieldStorage to have a useful default for its input, since RFC 
3875 declares that the message body is obtained from "standard input".

Pierre said:
I wish it could be as simple, but I'm afraid it's not. On my PC, 
sys.stdin.encoding is cp-1252. I tested a multipart/form-data with an INPUT 
field, and I entered the euro character, which is encoded  \x80 in cp-1252

If I use the encoding defined for sys.stdin (cp-1252) to decode the bytes 
received on sys.stdin.buffer, I get the correct value in the cgi script ; if I 
set the encoding to latin-1 in FieldStorage, since \x80 maps to undefined in 
latin-1, I get a UnicodeEncodeError if I try to print the value ("character 
maps to ")

I say:
Interesting. I'm curious what your system (probably Windows since you mention 
cp-) and browser, and HTTP server is, that you used for that test.  Is it 
possible to capture the data stream for that test?  Describe how, and at what 
stage the data stream was captured, if you can capture it.  Most interesting 
would be on the interface between browser and HTTP server.

RFC 3875 states (section 4.1.3) what the default encodings should be, but I see 
that the first possibility is "system defined".  On the other hand, it seems to 
imply that it should be a system definition specifically defined for particular 
media types, not just a general system definition such as might be used as a 
default encoding for file handles... after all, most Web communication crosses 
system boundaries.  So lacking a system defined definition for text/ types, it 
then indicates that the default for text/ types is Latin-1.

I wonder what result you get with the same browser, at the web page 
http://rishida.net/tools/conversion/ by entering the euro symbol into the 
Characters entry field, and choosing convert.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

I said:
I wonder what result you get with the same browser, at the web page 
http://rishida.net/tools/conversion/ by entering the euro symbol into the 
Characters entry field, and choosing convert.

But I couldn't wait, so I ran a test with € in one of my input boxes, using 
Firefox, a FORM as:

<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

R. David:

Pierre said:
BytesFeedParser only uses the ascii codec ; if the header has non ASCII 
characters (filename in a multipart/form-data), they are replaced by ? : the 
original file name is lost. So for the moment I leave the text version of 
FeedParser

I say:
Does this mean BytesFeedParser, to be useful for cgi.py, needs to accept an 
input parameter encoding, defaulting to ASCII for the email case?  Should that 
be a new issue?  Or should cgi.py, since it can't use email to do all its work 
(no support for file storage, no support for encoding) simply not try, and use 
its own code for header decoding also?  The only cost would be support for 
Encoded-Word -- but it is not clear that HTTP uses them?  Can anyone give an 
example of such?  Read the next message here for an example of filename 
containing non-ASCII.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

In my previous message I quoted Pierre rightly cautioning about headers 
containing non-ASCII... and that BytesFeedParser doesn't, so using it to parse 
headers may be questionable.

So I decided to try one... I show the Live HTTP headers below, from a simple 
upload form.  What is not so simple is the filename of the file to be 
uploaded... it contains a couple non-ASCII characters... in fact, one of them 
is non-latin-1 also: "foöţ.html".  It rather seems that Firefox provides the 
filename in UTF-8, although Live HTTP headers seems to have displayed it using 
Latin-1 on the screen!  But in saving it to a file, it didn't write a BOM, and 
the byte sequence for the filename is definitely UTF-8, and pasted here to be 
viewed correctly.

So my question: where does Firefox get its authority to encode the filename 
using UTF-8 ???

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) 
Gecko/20101203 Firefox/3.6.13
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://rkivs.com.gl:8032/row/test.html
Content-Type: multipart/form-data; 
boundary=---207991835220448
Content-Length: 304
-207991835220448
Content-Disposition: form-data; name="submit"

upload
-207991835220448
Content-Disposition: form-data; name="pre"; filename="foöţ.html"
Content-Type: text/html

aoheutns

-207991835220448--

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

Pierre said:
Since it works the same with 2 browsers and 2 web servers, I'm almost sure it's 
not dependant on the configuration - but if others can tests on different 
configurations I'd like to know the result

So I showed in my just previous messages (after the one you are responding to) 
my output from Live HTTP Headers, where it seems that Firefox is using UTF-8 
transmission, both for header values (filename) and data values (euro 
character).  Without specifying Content-Type (for the data) or doing RFC 2047 
encoding as would be expected from reading the various standard documents (RFC 
2045, W3 HTML 4.01, RFC 2388).  I wonder now if Live HTTP Headers is reporting 
the logical data, prior to encoding for transmission.  But I was getting UTF-8 
data inside my CGI script... 

So now I tweaked the server to save the bytes it transfers its rfile to the cgi 
process (had already tweaked that to be binary instead of having encodings), 
and it is clearly UTF-8 at that point also.  Looks just like the Live HTTP 
headers.  Now that I have data-capture on the server side, I can run the same 
tests with other browsers... so I ran it with Opera 11, IE 8, Chrome 8, and the 
only differences were the specific value of the boundaries... all the data was 
in UTF-8, both filename, and form data value.

I can't now find a setting for Firefox to allow the user to control the 
encoding it sends to the server, but I can't rule out that I once might have, 
and set it to UTF-8.  But I'm quite certain I don't know enough about the other 
browsers to adjust their settings.  I don't have Apache installed on this box, 
so I cannot test to see if it changes something.

Is there a newer standard these browsers are following, that permits UTF-8?  Or 
even requires it?

Why is Pierre seeing cp-1252, and I'm seeing UTF-8?  I'm running Windows 6.1 
(Build 7600), 64-bit, the so-called Windows 7 Professional edition.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

Aha!

Found a page <http://htmlpurifier.org/docs/enduser-utf8.html#whyutf8-support> 
which links to another page 
<http://web.archive.org/web/20060427015200/ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html>
 that explains the behavior.

The synopsis is that browsers (all modern browsers) return form data
Form data is generally returned in the same character encoding as the Form page 
itself was sent to the client.

I suspect this explains the differences between what Pierre and I are 
reporting.  I suspect (but would appreciate confirmation from Pierre), that his 
web pages use 

or else do not use such a meta tag, and his server is configured (or defaults) 
to send HTTP headers:
Content-Type: text/html; charset=CP-1252

Whereas, I do know that all my web pages are coded in UTF-8, have no meta tags, 
and my CGI scripts are sending 
Content-Type: text/html; charset=UTF-8
for all served form pages... and thus getting back UTF-8 also, per the above 
explanation.

What does this mean for Python support for http.server and cgi?
Well, http.server, by default, sends Content-Type without charset, except for 
directory listings, where it supplies charset= the result of 
sys.getfilesystemcoding().  So it is up to META tags to define the coding, or 
for the browser to guess.  That's probably OK: for a single machine 
environment, it is likely that the data files are coded in the default file 
system encoding, and it is likely the browser will guess that.  But it quickly 
breaks when going to a multiple machine or internet environment with different 
default encodings on different machines.  So if using http.server in such an 
environment, it is necessary to inform the client of the page encoding using 
META tags, or generating the Content-Type: HTTP header in the CGI script (which 
latter is what I'm doing for the forms and data of interest).

What does it mean for cgi.py's FieldStorage?

Well, use of the default encoding can work in the single machine environment... 
so I guess there are would be worse things that doing so, as Pierre has been 
doing.  But clearly, that isn't the complete solution.  The new parameter he 
proposes to FieldStorage can be used, if the application can properly determine 
the likeliest encoding for the form data, before calling it.

On a single machine system, that could be the default, as mentioned above.  On 
a single application web server, it could be some constant encoding used for 
all pages (like I use UTF-8 for all my pages).  For a multiple application web 
server, as long as each application uses a consistent encoding, that 
application could properly guess the encoding to pass to FieldStorage.  Or, if 
the application wishes to allow multiple encodings, as long as it can keep 
track of them, and use the right ones at the right time, it is welcome to.

How does this affect email?  Not at all, directly.

How does this affect cgi.py's use of email?
It means that cgi.py cannot use BytesFeedParser, in spite of what the standards 
say, so Pierre's approach of predecoding the headers is the correct one, since 
email doesn't offer an encoding parameter.  Since email doesn't support disk 
storage for file uploads, but buffers everything in memory, it means that 
cgi.py can only pass headers to FeedParser, so has to detect end-of-headers 
itself, since email provides no feedback to indicate that end-of-headers was 
reached, and that means that cgi.py must parse the MIME parts itself, so it can 
put the large parts on disk. It means that the email package provides extremely 
little value to cgi.py, and since web browsers and multipart/form-data use 
simple subsets of the full power of RFC822 headers, email could be replaced 
with the use of its existing parse_header function, but that should be 
deprecated.  A copy could be moved inside FieldStorage class and fixed a bit.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

I notice the version on this issue is Python 3.3, but it affects 3.2 and 3.1 as 
well.  While I would like to see it fixed for 3.2, perhaps it is too late for 
that, with rc1 coming up this weekend?

Could at least the non-deprecated parse functions be deprecated in 3.2, so that 
they could be removed in 3.3?  Or should we continue to support them?

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-11 Thread Glenn Linderman

Glenn Linderman  added the comment:

Pierre,
I applied your patch to my local copy of cgi.py for my installation of 3.2, and 
have been testing.  Lots of things work great!

My earlier comment regarding make_file seems to be relevant.  Files that are 
not binary should have an encoding.  Likely you removed the encoding because it 
was a hard-coded UTF-8 and that didn't work for you, with your default encoding 
of cp-1252.  However, now that I am passing in UTF-8 via the stream-encoding 
parameter, because that is what matches my form-data, I get an error that 
cp-1252 (apparently also my default encoding, except for console stuff which is 
437) cannot encode \u0163.  So I think the encoding parameter should be added 
back in, but the value used should be the stream_encoding parameter.  You might 
also turn around the test on self.filename:

import tempfile
if self.filename:
return tempfile.TemporaryFile("wb+")
else:
return tempfile.TemporaryFile("w+",
  encoding=self.stream_encoding,
  newline="\n")

One of my tests used a large textarea and a short file.  I was surprised to see 
that the file was not stored as a file, but the textarea was.  I guess that is 
due to the code in read_single that checks length rather than filename to 
decide whether it should be stored in a file from the get-go.  It seems that 
this behaviour, while probably more efficient than actually creating a file, 
might be surprising to folks overriding make_file so that they could directly 
store the data in the final destination file, instead of copying it later.  The 
documented semantics for make_file do not state that it is only called if there 
is more than 1000 bytes of data, or that the form_data item headers contain a 
CONTENT-LENGTH header (which never seems to happen).  Indeed, I found a comment 
on StackOverflow where someone had been surprised that small files did not have 
make_file called on them.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-12 Thread Glenn Linderman

Glenn Linderman  added the comment:

I'd be willing to propose such a patch and tests, but I haven't a clue how, 
other than starting by reading the contributor document... I was putting off 
learning the process until hg conversion, not wanting to learn an old process 
for a few months :(  And I've never written an official Python test, or learned 
how to use the test modules, etc.  So that's a pretty steep curve for the 2 
days remaining.

Due to the way that browsers actually work, vs. how the standards are written, 
it seems necessary to add the optional  stream_encoding parameter.  The limit 
parameter Pierre is proposing is also a good check against improperly formed 
inputs.  So there are new, optional parameters to the FieldStorage constructor.

Without these fixes, though, cgi.py continues to be totally useless for file 
uploads, so not releasing this in 3.2 makes 3.2 continue to be useless as a 
basis for web applications.  I have no idea if there is a timeframe for 3.3, 
nor what it is.  I'm not sure if, or how many, web frameworks use cgi.py vs. 
replacing the functionality.  Seems at least some replace it, so they may not 
suffer in porting to 3.x (except internally, grappling with the same issues).

Happily, Pierre's latest patch needs only one more fix, per my 
(non-Python-standard) testing.  Between his testing in one environment using 
default code pages, and mine using UTF-8, the bases seem to be pretty well 
covered for testing... certainly more than the previous default tests.  I think 
you contributed some tests, I haven't tried them, but it seems Pierre has, as 
he has a patch for that also (which I haven't tried).

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-12 Thread Glenn Linderman

Glenn Linderman  added the comment:

Pierre said:
The encoding used by the browser is defined in the Content-Type meta tag, or 
the content-type header ; if not, the default seems to vary for different 
browsers. So it's definitely better to define it

The argument stream_encoding used in FieldStorage *must* be this encoding

I say:
I agree it is better to define it.  I think you just said the same thing that 
the page I linked to said, I might not have conveyed that correctly in my 
paraphrasing.  I assume you are talking about the charset of the Content-Type 
of the form page itself, as served to the browser, as the browser, sadly, 
doesn't send that charset back with the form data.

Pierre says:
But this raises another problem, when the CGI script has to print the data 
received. The built-in print() function encodes the string with 
sys.stdout.encoding, and this will fail if the string can't be encoded with it. 
It is the case on my PC, where sys.stdout.encoding is cp1252 : it can't handle 
Arabic or Chinese characters

I say:
I don't think there is any need to override print, especially not 
builtins.print.  It is still true that the HTTP data stream is and should be 
treated as a binary stream.  So the script author is responsible for creating 
such a binary stream.

The FieldStorage class does not use the print method, so it seems inappropriate 
to add a parameter to its constructor to create a print method that it doesn't 
use.

For the convenience of CGI script authors, it would be nice if CGI provided 
access to the output stream in a useful way... and I agree that because the 
generation of an output page comes complete with its own encoding, that the 
output stream encoding parameter should be separate from the stream_encoding 
parameter required for FieldStorage.

A separate, new function or class for doing that seems appropriate, possibly 
included in cgi.py, but not in FieldStorage.  Message 125100 in this issue 
describes a class IOMix that I wrote and use for such; codifying it by 
including it in cgi.py would be fine by me... I've been using it quite 
successfully for some months now.

The last line of Message 125100 may be true, perhaps a few more methods should 
be added.  However, print is not one of them.  I think you'll be pleasantly 
surprised to discover (as I was, after writing that line) that the 
builtins.print converts its parameters to str, and writes to stdout, assuming 
that stdout will do the appropriate encoding.  The class IOMix will, in fact, 
do that appropriate encoding (given an appropriate parameter to its 
initialization.  Perhaps for CGI, a convenience function could be added to 
IOMix to include the last two code lines after IOMix in the prior message:

@staticmethod
def setup( encoding="UTF-8"):
sys.stdout = IOMix( sys.stdout, encoding )
sys.stderr = IOMix( sys.stderr, encoding )

Note that IOMix allows the users choice of output stream encoding, applies it 
to both stdout and stderr, which both need it, and also allows the user to 
generate binary directly (if sending back a file, for example), as both bytes 
and str are accepted.

print can be used with a file= parameter in 3.x which your implementation 
doesn't permit, and which could be used to write to other files by a CGI 
script, so I really, really don't think we want to override builtins.print 
without the file= parameter, and specifically tying it to stdout.

My message 126075 still needs to be included in your next patch.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0

2011-01-13 Thread Glenn Linderman

Glenn Linderman  added the comment:

Pierre,
Looking better.
I see you've retained the charset parameter, but do not pass it through to 
nested calls of FieldStorage.  This is good, because it wouldn't work if you 
did.  However, purists might still complain that FieldStorage should only ever 
use and affect stdin... however, since I'm a pragmatist, I'll note that the 
default charset value is None, which means it does nothing to stdout or stderr 
by default, and be content with that.

I've run a couple basic tests and it works, and the other things the code 
hasn't changed since your last iteration, but I'll test them again after I get 
some sleep.

I'll try setting the Version here back to 3.2 -- it is a bug in 3.2 -- and see 
if some committer will take pity on web developers that use CGI, and are hoping 
to be able to use Python 3.2 someday.

--
versions: +Python 3.2

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.x

2011-01-13 Thread Glenn Linderman

Glenn Linderman  added the comment:

The O_BINARY stuff was probably necessary because issue 10841 is not yet in the 
build Pierre was using?  I agree it in not necessary with the fix for that 
issue, but neither does it hurt.

It could be stripped out, if you think that is best, Antoine.

But there is a working patch.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.x

2011-01-13 Thread Glenn Linderman

Glenn Linderman  added the comment:

Victor, thanks for your comments, and interest in this bug.  Other than the 
existence of the charset parameter, and whether or not to include IOMix, I 
think all of the others could be fixed later, and do not hurt at present.  So I 
will just comment on those two comments.

I would prefer to see FieldStorage not have the charset attribute also, but I 
don't have the practice to produce an alternate patch, and I can see that it 
would be a convenience for some CGI scripts to specify that parameter, and have 
one API call do all the work necessary to adjust the IO streams, and read all 
the parameters, and then the rest of the logic of the web app can follow.  
Personally, I adjust the stdout/stderr streams earlier in my scripts, and only 
optionally call FieldStorage, if I determine the request needs such.

I've been using IOMix for some months (I have a version for both Python 2 and 
3), and it solves a real problem in generating web page data streams... the 
data stream should be bytes, but a lot of the data is manipulated using str, 
which would then need to be decoded.  The default encoding of stdout is usually 
wrong, so must somehow be changed.  And when you have chunks of bytes (in my 
experience usually from a database or file) to copy to the output stream, if 
your prior write was str, and then you write bytes to sys.stdout.binary, you 
have to also remember to flush the TextIOBuffer first.  IOMix provides a 
convenient solution to all these problems, doing the flushing for you 
automatically, and just taking what comes and doing the right thing.  If I 
hadn't already invented IOMix to help write web pages, I would want to :)

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.x

2011-01-13 Thread Glenn Linderman

Glenn Linderman  added the comment:

Graham, Thanks for your comments.  Fortunately, if the new charset parameter is 
not supplied, no mucking with stdout or stderr is done, which is the only 
reason I cannot argue strongly against the feature, which I would have 
implemented as a separate API... it doesn't get in the way if you don't use it.

I would be happy to see the argv code removed, but it has been there longer 
than I have been a Python user, so I just live with it ... and don't pass 
arguments to my CGI scripts anyway.  I've assumed that is some sort of a debug 
feature, but I also saw some code in the HTTPCGIServer and http.server that 
apparently, on some platforms, actually do pass parameters to CGI on the 
command lines.  I would be happy to see that code removed too, but it also 
predates my Python experience.  And no signs of "if debug:" by either of them!

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.x

2011-01-14 Thread Glenn Linderman

Glenn Linderman  added the comment:

Pierre, Thank you for the new patch, with the philosophy of "it's broke, so 
let's produce something the committers like to get it fixed".

I see you overlooked removing the second use of O_BINARY.  Locally, I removed 
that also, and tested your newest patch, and it still functions great for me.

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.x

2011-01-14 Thread Glenn Linderman

Changes by Glenn Linderman :


--
versions: +Python 3.2 -Python 3.3

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4953] cgi module cannot handle POST with multipart/form-data in 3.x

2011-01-14 Thread Glenn Linderman

Glenn Linderman  added the comment:

Thanks to Pierre for producing patch after patch and testing testing testing, 
and to Victor for committing it, as well as others that contributed in smaller 
ways, as I tried to.  I look forward to 3.2 rc1 so I can discard all my 
temporary patched copies of cgi.py

--

___
Python tracker 
<http://bugs.python.org/issue4953>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602] windows console doesn't print or input Unicode

2011-01-14 Thread Glenn Linderman

Glenn Linderman  added the comment:

Victor said:
Why do you set the code page to 65001? In all my tests (on Windows XP), it 
always break the standard input.

My response:
Because when I searched Windows for Unicode and/or UTF-8 stuff, I found 65001, 
and it seems like it might help, and it does a bit.  And then I find 
PYTHONIOENCODING, and that helps some.  And that got me something that works 
better enough than what I had before, so I quit searching.

You did a better job of analyzing and testing all the cases.  I will have to go 
subtract the 65001 part, and confirm your results, maybe it is useless now that 
other pieces of the puzzle are in place.  Certainly with David-Sarah's code it 
seems to not be needed, whether it was a necessary part of the previous 
workaround I am not sure, because of the limited number of cases I tried 
(trying to find something that worked well enough, but not having enough 
knowledge to find David-Sarah's solution, nor a good enough testing methodology 
to try the pieces independently.

Thank your for your interest in this issue.

--

___
Python tracker 
<http://bugs.python.org/issue1602>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10879] cgi memory usage

2011-01-24 Thread Glenn Linderman

Glenn Linderman  added the comment:

Issue 4953 has somewhat resolved this issue by using email only for parsing 
headers (more like 2.x did).  So this issue could be closed, or could be left 
open to point out the required additional features needed from email before 
cgi.py can use it for handling body parts as well as headers.

--

___
Python tracker 
<http://bugs.python.org/issue10879>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10479] cgitb.py should assume a binary stream for output

2011-01-30 Thread Glenn Linderman

Glenn Linderman  added the comment:

So since cgi.py was fixed to use the .buffer attribute of sys.stdout, that 
leaves sys.stdout itself as a character stream, and cgitb.py can successfully 
write to that.

If cgitb.py never writes anything but ASCII, then maybe that should be 
documented, and this issue closed.

If cgitb.py writes non-ASCII, then it should use an appropriate encoding for 
the web application, which isn't necessarily the default encoding on the 
system.  Some user control over the appropriate encoding should be given, or it 
should be documented that the encoding of sys.stdout should be changed to an 
appropriate encoding, because that is where cgitb.py will write its character 
stream.  Guidance on how to do that would be appropriate for the documentation 
also, as a CGI application may be the first one a programmer might write that 
can't just use the default encoding configured for the system.

--

___
Python tracker 
<http://bugs.python.org/issue10479>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10480] cgi.py should document the need for binary stdin/stdout

2011-01-30 Thread Glenn Linderman

Glenn Linderman  added the comment:

Fixed by issue 10841 and issue 4953.

--
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue10480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   >