As for other resources, I recently came across this:
http://farmdev.com/talks/unicode/
This was the first explanation that really made me understand the
difference between Unicode and utf-8 (and realize that I'd been using
the terms 'encode' and 'decode' backwards!). Anyway, just one more
resource.
E
On Jul 9, 2008, at 9:32 AM, Kent Johnson wrote:
On Tue, Jul 8, 2008 at 5:19 PM, Robert Johansson
<[EMAIL PROTECTED]> wrote:
Hi, I'm puzzled by the character encodings which I get when I use
Python
with IDLE. The string '\xf6' represents a letter in the Swedish
alphabet
when coded with utf8. On our computer with MacOSX this gets coded as
'\xc3\xb6' which is a string of length 2. I have configured IDLE to
encode
utf8 but it doesn't make any difference.
I think you may be a bit confused about utf-8. '\xf6' is not a utf-8
character. U00F6 is the Unicode (not utf-8) codepoint for LATIN SMALL
LETTER O WITH DIAERESIS. '\xf6' is also the Latin-1 encoding of this
character. The utf-8 encoding of this character is the two-byte
sequence '\xc3\xb6'.
Can you give some more specific details about what you do and what you
see? Also you might want to do some background reading on Unicode;
this is a good place to start:
http://www.joelonsoftware.com/articles/Unicode.html
Kent
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor