Alex Hall wrote: > Hello again: > I have never seen this message before. I am pulling xml from a site's > api and printing it, testing the wrapper I am writing for the api. I > have never seen this error until just now, in the twelfth result of my > search: > UnicodeEncodeError: 'ASCII' codec can't encode character u'\u2019' in > position 42: ordinal not in range(128) > > I tried making the strings Unicode by saying something like > self.title=unicode(data.find("title").text) > but the same error appeared. I found the manual chapter on this, but I > am not sure I want to ignore since I do not know what this character > (or others) might mean in the string. I am not clear on what 'replace' > will do. Any suggestions?
You get a UnicodeEncodeError if you print a unicode string containing non- ascii characters, and Python cannot determine the target's encoding: $ cat tmp.py # -*- coding: utf-8 -*- print u'äöü' $ python tmp.py äöü $ python tmp.py > tmp.txt Traceback (most recent call last): File "tmp.py", line 2, in <module> print u'äöü' UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) The error occurs because by default Python 2 tries to convert unicode into bytes using the ascii codec. One approach to tackle this is to check sys.stdout's encoding, and if it's unknown (None) wrap it into a codecs.Writer that can handle all characters that may occur. UTF-8 is usually a good choice, but other codecs are possible. $ cat tmp2.py # -*- coding: utf-8 -*- import sys if sys.stdout.encoding is None: import codecs Writer = codecs.getwriter("utf-8") sys.stdout = Writer(sys.stdout) print u'äöü' $ python tmp2.py äöü $ python tmp2.py > tmp.txt $ cat tmp.txt äöü _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor