Re: [Python-Dev] TRUNK FREEZE for 2.4 final from 2100 UTC, 29-11-2004

2004-11-29 Thread Anthony Baxter
On Monday 29 November 2004 12:50, Anthony Baxter wrote:
> Ok, we're about ready for the 2.4 final release. Please hold off any
> checkins post 21:00 UTC (so in about 19-20 hours from now).

I should also note that shortly after the release is done and we've
confirmed that there's no brown-paper-bag bugs, I'll be branching
the release24-maint branch in CVS. 

Anthony
___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] File encodings

2004-11-29 Thread Gustavo Niemeyer
Greetings,

Today, while trying to internationalize a program I'm working on,
I found an interesting side-effect of how we're dealing with
encoding of unicode strings while being written to files.

Suppose the following example:

  # -*- encoding: iso-8859-1 -*-
  print u"á"

This will correctly print the string 'á', as expected. Now, what
surprises me, is that the following code won't work in an equivalent
way (unless using sys.setdefaultencoding()):

  # -*- encoding: iso-8859-1 -*-
  import sys
  sys.stdout.write(u"á\n")

This will raise the following error:

  Traceback (most recent call last):
File "asd.py", line 3, in ?
  sys.stdout.write(u"á")
  UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1'
  in position 0:ordinal not in range(128)

This difference may become a really annoying problem when trying to
internationalize programs, since it's usual to see third-party code
dealing with sys.stdout, instead of using 'print'. The standard
optparse module, for instance, has a reference to sys.stdout which
is used in the default --help handling mechanism.

Given the fact that files have an 'encoding' parameter, and that
any unicode strings with characters not in the 0-127 range will
raise an exception if being written to files, isn't it reasonable
to respect the 'encoding' attribute whenever writing data to a
file?

The workaround for that problem is to either use the evil-considered
sys.setdefaultencoding(), or to wrap sys.stdout. IMO, both options
seem unreasonable for such a common idiom.

-- 
Gustavo Niemeyer
http://niemeyer.net
___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] state of 2.4 final release

2004-11-29 Thread Tim Peters
[Anthony Baxter]
> I didn't see any replies to the last post, so I'll ask again with a
> better subject line - as I said last time, as far as I'm aware, I'm
> not aware of anyone having done a fix for the issue Tim identified
> ( http://www.python.org/sf/1069160 )
> 
> So, my question is: Is this important enough to delay a 2.4 final
> for?

Not according to me; said before I'd be happy if everyone pretended I
hadn't filed that report until a month after 2.4 final was released.
___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] File encodings

2004-11-29 Thread Bob Ippolito
On Nov 29, 2004, at 2:04 PM, Gustavo Niemeyer wrote:
Today, while trying to internationalize a program I'm working on,
I found an interesting side-effect of how we're dealing with
encoding of unicode strings while being written to files.
Suppose the following example:
  # -*- encoding: iso-8859-1 -*-
  print u"á"
This will correctly print the string 'á', as expected. Now, what
surprises me, is that the following code won't work in an equivalent
way (unless using sys.setdefaultencoding()):
That doesn't work here, where sys.getdefaultencoding() is 'ascii', as 
expected.

  # -*- encoding: iso-8859-1 -*-
  import sys
  sys.stdout.write(u"á\n")
This will raise the following error:
  Traceback (most recent call last):
File "asd.py", line 3, in ?
  sys.stdout.write(u"á")
  UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1'
  in position 0:ordinal not in range(128)
That's expected.
This difference may become a really annoying problem when trying to
internationalize programs, since it's usual to see third-party code
dealing with sys.stdout, instead of using 'print'. The standard
optparse module, for instance, has a reference to sys.stdout which
is used in the default --help handling mechanism.
Given the fact that files have an 'encoding' parameter, and that
any unicode strings with characters not in the 0-127 range will
raise an exception if being written to files, isn't it reasonable
to respect the 'encoding' attribute whenever writing data to a
file?
No, because you don't know it's a file.  You're calling a function with 
a unicode object.  The function doesn't know that the object was some 
unicode object that came from a source file of some particular 
encoding.

The workaround for that problem is to either use the evil-considered
sys.setdefaultencoding(), or to wrap sys.stdout. IMO, both options
seem unreasonable for such a common idiom.
There's no guaranteed correlation whatsoever between the claimed 
encoding of your source document and the encoding of the user's 
terminal, why do you want there to be?  What if you have some source 
files with 'foo' encoding and others with 'bar' encoding?  What about 
ascii encoded source documents that use escape sequences to represent 
non-ascii characters?  What you want doesn't make any sense so long as 
python strings and file objects deal in bytes not characters :)

Wrapping sys.stdout is the ONLY reasonable solution.
This is the idiom that I use.  It's painless and works quite well:
import sys
import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
-bob
___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com