> > Hi, I'm puzzled by the character encodings which I get when I use Python > > with IDLE. The string '\xf6' represents a letter in the Swedish alphabet > > when coded with utf8. On our computer with MacOSX this gets coded as > > '\xc3\xb6' which is a string of length 2. I have configured IDLE to encode > > utf8 but it doesn't make any difference. > > I think you may be a bit confused about utf-8. '\xf6' is not a utf-8 > character. U00F6 is the Unicode (not utf-8) codepoint for LATIN SMALL > LETTER O WITH DIAERESIS. '\xf6' is also the Latin-1 encoding of this > character. The utf-8 encoding of this character is the two-byte > sequence '\xc3\xb6'. > > Also you might want to do some background reading on Unicode; > this is a good place to start: > http://www.joelonsoftware.com/articles/Unicode.html
kent is quite correct, and here is some Python code to demo it: >>> x = u'\xf6' >>> x u'\xf6' >>> print x ö >>> y = x.encode('utf-8') >>> y '\xc3\xb6' >>> print y ö in the code above, our source string 'x' is a Unicode string, which is "pure," meaning that it has not been encoded by any codec. we encode this Unicode string into a UTF-8 binary string 'y', which takes up 2 bytes as Kent has mentioned already. we are able to dump the variables as well as print them fine to the screen because our terminal was set to UTF-8. if we switch our terminal output to Latin-1, then we can view it that way -- notice that the Latin-1 encoding only takes 1 byte instead of 2 for UTF-8: >>> z = x.encode('latin-1') >>> z '\xf6' >>> print z ö here's another recommended Unicode document that is slightly more Python-oriented: http://wiki.pylonshq.com/display/pylonsdocs/Unicode cheers, -- wesley - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - "Core Python Programming", Prentice Hall, (c)2007,2001 http://corepython.com wesley.j.chun :: wescpy-at-gmail.com python training and technical consulting cyberweb.consulting : silicon valley, ca http://cyberwebconsulting.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor