Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-13 Thread Stephen J. Turnbull
Steven D'Aprano writes: > I don't think anyone has ever suggested change for change's sake. If > they have, I'd love to read the PEP for it. Not to mention the BDFL's pronouncement message! ___ Python-Dev mailing list Python-Dev@python.org http://mai

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-12 Thread Steven D'Aprano
On Wed, 13 Oct 2010 03:01:57 am l...@rmi.net wrote: > So my point is just this: Change for change's sake is truly not > what most Python users want.  If Python core developers want 3.X > to become as popular as 2.X, they should be less concerned with > posts on this list or hands at a conference, t

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-12 Thread lutz
rmi.net/~lutz) > Date: Fri, 8 Oct 2010 14:20:32 -0400 > From: Barry Warsaw > To: python-dev@python.org > Subject: Re: [Python-Dev] Patch making the current email package (mostly) > support bytes > > On Oct 08, 2010, at 03:44 PM, l...@rmi.net wrote: > > >Ultimately,

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Stephen J. Turnbull
Barry Warsaw writes: > On Oct 09, 2010, at 02:48 AM, Stephen J. Turnbull wrote: > > >Right. That's where I was going with my comment to Barry about the > >Received headers. Even if email isn't going to serve clients working > >with wire format, it needs to deal with those headers. But wher

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Sat, 09 Oct 2010 02:48:23 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > On Sat, 09 Oct 2010 01:06:29 +0900, "Stephen J. Turnbull" > wrote: > > > That mess is entirely unnecessary in Python 3. Text and wire format > > > can be easily distinguished with three different

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Barry Warsaw
On Oct 09, 2010, at 02:48 AM, Stephen J. Turnbull wrote: >Right. That's where I was going with my comment to Barry about the >Received headers. Even if email isn't going to serve clients working >with wire format, it needs to deal with those headers. But where I >think the headers defined by RF

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Barry Warsaw
On Oct 08, 2010, at 03:44 PM, l...@rmi.net wrote: >Ultimately, development in the open source world is driven by the >very few with time to show up, rather than by the very many who >depend on it. This can unfortunately lead to the perception >of thrashing by end users. Some even come to see t

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Stephen J. Turnbull
R. David Murray writes: > On Sat, 09 Oct 2010 01:06:29 +0900, "Stephen J. Turnbull" > wrote: > > That mess is entirely unnecessary in Python 3. Text and wire format > > can be easily distinguished with three different representations of > > email: Unicode for the conceptual RFC 822 layer (o

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Fri, 08 Oct 2010 12:37:38 +0900, "Stephen J. Turnbull" wrote: > *If* you have an 8-bit value of unknown encoding on input, this will > appear in the Header's value as a surrogate. Hm, OK, I see the > problem ... as usual, it's that the only efficient thing to do is > encode using surrogate-es

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Fri, 08 Oct 2010 23:55:37 +0900, "Stephen J. Turnbull" wrote: > I should think you *want* addresses and suchlike structured headers > (Content-Type with several RFC 2231 parameters, anyone?) to line up > nicely, too. So generic folding algorithms are really only applicable > to unstructured t

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Fri, 08 Oct 2010 15:44:45 -, l...@rmi.net wrote: > Thanks for both your reply and work, David. I'm going to have > to test my email clients under the 3.2 patch when it gels. It's > good to hear that email5 API support remains a goal. I just landed the patch (though without the MIME encodi

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Sat, 09 Oct 2010 01:06:29 +0900, "Stephen J. Turnbull" wrote: > That mess is entirely unnecessary in Python 3. Text and wire format > can be easily distinguished with three different representations of > email: Unicode for the conceptual RFC 822 layer (of course this is an > extension, becaus

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Fri, 08 Oct 2010 15:51:45 -, l...@rmi.net wrote: > For my part, one week from now I'll be standing up again in front > of a group of 20 Python beginners, and basically apologizing for > both the present and ongoing 3.X changes they must conform to in > the near future. Python may not be

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Stephen J. Turnbull
Barry Warsaw writes: > On Oct 07, 2010, at 04:40 AM, Stephen J. Turnbull wrote: > I'm fairly certain that most of the modern causes of [Unicode > errors in Mailman] are post-parse modifications of the message. > IOW, in Mailman's architecture, we try to parse the raw data into a > Message obj

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread lutz
t will break their code. (Yes, sarcasm intended.) --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > -Original Message- > From: "Stephen J. Turnbull" > To: l...@rmi.net > Subject: Re: [Python-Dev] Patch making the current email package > (most

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread lutz
ally given the still tentative state of 3.X, stability matters. --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > -Original Message- > From: "R. David Murray" > To: l...@rmi.net > Subject: Re: [Python-Dev] Patch making the current email package (mo

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Stephen J. Turnbull
Barry Warsaw writes: > Header wrapping sucks even more because it's supposed to take the > semantic context into account, which means that a generic Header > wrapping algorithm cannot work for everything. E.g. Received: > headers are supposed to wrap after the semicolon. Received headers are

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Barry Warsaw
On Oct 08, 2010, at 12:37 PM, Stephen J. Turnbull wrote: >Ouch. RFC 822 line wrapping is a bytes->bytes transformation, and the >client shouldn't see it at all unless it inspects the wire format. Header wrapping sucks even more because it's supposed to take the semantic context into account, whi

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Stephen J. Turnbull
l...@rmi.net writes: > To put that more strongly, the Python user base is much larger than > this list's readership. Agreed. Nevertheless, this is the channel (not "channel") that the developers listen on, and substantial effort is made to let Python users know that. I think they do know it,

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Stephen J. Turnbull
R. David Murray writes: > > The MIME-charset = UNKNOWN dodge might be a better way of handling > > this. > > That is a very interesting idea. It is the *right* thing to do, since it > would mean that a message parsed as bytes could be generated via Generator > and passed to, say, smtplib w

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Barry Warsaw
On Oct 07, 2010, at 04:40 AM, Stephen J. Turnbull wrote: > > And the email API currently promises not to raise during parsing, > > which is a contract my patch does not change. > >Which is a contract that has historically been broken frequently. >Unhandled UnicodeErrors have been one of the most c

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 16:03:18 -, l...@rmi.net wrote: > I'm forwarding a link to the code of these clients to David by > private email in case they might be useful as a test case (O'Reilly > has already posted them ahead of the book, but they may be a bit too > heavy for use in formal testing).

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread lutz
Stephen J. Turnbull wrote (giving me an opening to jump in here): > R. David Murray writes: > > In other words, my proposed patch only makes email5 1/8 to 1/4 > > broken, instead of half broken as it is now. But not un-broken > > enough for Mailman, it sounds like. > > IMO, not in the long run. B

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 15:00:04 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > > But that's not interesting; you did that with Python 3. We want to > > Of course I did it with Python3. It's the Python3 email codebase > > I'm working with (and have to work *around*). > > S

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
Stephen J. Turnbull xemacs.org> writes: > R. David Murray writes: > > We're (in the current patch) not punting on handling non-conforming > > email, we're punting on handling non-conforming bytes *if the headers > > that contain them need to be modified*. The headers can still be > > modified

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 03:31:34 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > 5. Return the content, with non-ASCII bytes replaced with ? > > characters. > > That hadn't occurred to me (and it makes me sick to contemplate it). > > That said, this is probably good enou

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes: > So the only parsing issue is if Mailman cares about *the non-ASCII > bytes* in the headers it cares about. If it has to modify headers that > contain non-ASCII bytes (for example, addresses and Subject) and cares > about preserving the non-ASCII bytes, then there is

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes: > 5. Return the content, with non-ASCII bytes replaced with ? > characters. That hadn't occurred to me (and it makes me sick to contemplate it). That said, this is probably good enough for Mailman-like apps to limp along for "most" users. It's certainly good enoug

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread R. David Murray
On Wed, 06 Oct 2010 22:55:00 +0900, "Stephen J. Turnbull" wrote: > R. David Murray writes: > > > version of headers to the email5 API, but since any such data would > > be non-RFC compliant anyway, [access to non-conforming headers by > > reparsing the bytes] will just have to be good enough

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread R. David Murray
On Wed, 06 Oct 2010 12:22:18 +0900, "Stephen J. Turnbull" wrote: > Nick Coghlan writes: > > > - if you pass in bytes data and know what you are doing, then you can > > access that raw bytes data and do your own decoding > > At what level, though? > > To take an interesting example I used to

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes: > version of headers to the email5 API, but since any such data would > be non-RFC compliant anyway, [access to non-conforming headers by > reparsing the bytes] will just have to be good enough for now. But that's potentially unpleasant for, say, Mailman. AFAICS, what

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-05 Thread Stephen J. Turnbull
Nick Coghlan writes: > - if you pass in bytes data and know what you are doing, then you can > access that raw bytes data and do your own decoding At what level, though? To take an interesting example I used to see frequently: From: t...@tokyo.jp (Taro Yamada in 8-bit Shift JIS) So I g

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-05 Thread R. David Murray
On Tue, 05 Oct 2010 22:05:33 +1000, Nick Coghlan wrote: > On Tue, Oct 5, 2010 at 3:41 PM, Stephen J. Turnbull > wrote: > > R. David Murray writes: > > > Only if the email package contains a coding error would the > > > surrogates escape and cause problems for user code. > > > > I don't think it i

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-05 Thread Nick Coghlan
On Tue, Oct 5, 2010 at 3:41 PM, Stephen J. Turnbull wrote: > R. David Murray writes: >  > Only if the email package contains a coding error would the >  > surrogates escape and cause problems for user code. > > I don't think it is reasonable to internalize surrogates that way; > some applications

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Stephen J. Turnbull
R. David Murray writes: > On Mon, 04 Oct 2010 12:32:26 -0400, Scott Dial > wrote: > > On 10/2/2010 7:00 PM, R. David Murray wrote: > > > The clever hack (thanks ultimately to Martin) is to accept 8bit data > > > by encoding it using the ASCII codec and the surrogateescape error > > > handle

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Barry Warsaw
On Oct 02, 2010, at 07:00 PM, R. David Murray wrote: >The advantage of this patch is that it means Python3.2 can have an >email module that is capable of handling a significant proportion of >the applications where the ability to process binary email data is >required. Like others, I'm concerned

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread R. David Murray
On Mon, 04 Oct 2010 12:32:26 -0400, Scott Dial wrote: > On 10/2/2010 7:00 PM, R. David Murray wrote: > > The clever hack (thanks ultimately to Martin) is to accept 8bit data > > by encoding it using the ASCII codec and the surrogateescape error > > handler. > > I've seen this idea pop up in a nu

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Scott Dial
On 10/2/2010 7:00 PM, R. David Murray wrote: > The clever hack (thanks ultimately to Martin) is to accept 8bit data > by encoding it using the ASCII codec and the surrogateescape error > handler. I've seen this idea pop up in a number of threads. I worry that you are all inventing a new kind of du

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-02 Thread Nick Coghlan
On Sun, Oct 3, 2010 at 9:00 AM, R. David Murray wrote: > I do not propose that this is a *good* API, since it has the classic > problem that if there are coding bugs in the email module strings may > "escape" that have surrogates in them and we end up with programs that > work most of the time

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-02 Thread R. David Murray
On Sat, 02 Oct 2010 19:15:57 -0500, Benjamin Peterson wrote: > 2010/10/2 R. David Murray : > > Regardless of whether or not this patch or a descendant thereof is > > accepted I still intend to continue working on email6. =C2=A0There are ma= > ny > > other bugs in the current email package that re

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-02 Thread Benjamin Peterson
2010/10/2 R. David Murray : > Regardless of whether or not this patch or a descendant thereof is > accepted I still intend to continue working on email6.  There are many > other bugs in the current email package that require a rewrite of parts > of its infrastructure, and the email-sig is agreed th