>This is the first question in the BeautifulSoup FAQ at >http://www.crummy.com/software/BeautifulSoup/FAQ.html
>Unfortunately the author of BS considers this a problem with your Python installation! So it >seems he doesn't have a good understanding of Python and Unicode. (OK, I can forgive him >that, I think there are only a handful of people who really do understand it completely.) >The first fix given doesn't work. The second fix works but it is not a good idea to change the >default encoding for your Python install. There is a hack you can use to change the default >encoding just for one program; in your program put > reload(sys); sys.setdefaultencoding('utf-8') >This seems to fix the problem you are having. >Kent Hi Kent, I did read the FAQ before posting, honest :) But it does seem to be addressing a different issue. He says to try: >>> latin1word = 'Sacr\xe9 bleu!' >>> unicodeword = unicode(latin1word, 'latin-1') >>> print unicodeword Sacré bleu! Which worked fine for me. And then he gives a solution for fixing -display- problems on the terminal. For instance, his first solution was : "The easy way is to remap standard output to a converter that's not afraid to send ISO-Latin-1 or UTF-8 characters to the terminal." But I avoided displaying anything in my original example, because I didn't want to confuse the issue. It's also why I didn't mention the damning FAQ entry: >>> y = results[1].a.fetchText(re.compile('.+')) Is all I am trying to do. I don't expect non-ASCII characters to display correctly, however I was suprised when I tried "print x" in my original example, and it printed. I would have expected to have to do something like: >>> print x.encode("utf8") Matt Croydon::Postneo 2.0 » Blog Archive » Mobile Screen Scraping <b>...</b> I've just looked, and I have to do this explicit encoding under python 2.3.4, but not under 2.4.1. So perhaps 2.4 is less afraid/smarter about converting and displaying non-ascii characters to the terminal. Either way, I don't -think- that's my problem with Beautiful Soup. Changing my default encoding does indeed fix it, but it may be a reflection of the author making bad assumptions because his default was set to utf-8. I'm not really experienced enough to tell what is going on in his code, but I've been trying. Does seem to defeat the point of unicode, however. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor