>This is the first question in the BeautifulSoup FAQ at
>http://www.crummy.com/software/BeautifulSoup/FAQ.html

>Unfortunately the author of BS considers this a problem with your
Python installation! So it
>seems he doesn't have a good understanding of Python and Unicode.
(OK, I can forgive him
>that, I think there are only a handful of people who really do
understand it completely.)
>The first fix given doesn't work. The second fix works but it is not
a good idea to change the
>default encoding for your Python install. There is a hack you can use
to change the default
>encoding just for one program; in your program put
> reload(sys); sys.setdefaultencoding('utf-8')

>This seems to fix the problem you are having.

>Kent

Hi Kent, 

I did read the FAQ before posting, honest :)  But it does seem to be
addressing a different issue.

He says to try:

>>> latin1word = 'Sacr\xe9 bleu!'
>>> unicodeword = unicode(latin1word, 'latin-1')
>>> print unicodeword
Sacré bleu!

Which worked fine for me.  And then he gives a solution for fixing
-display- problems on the terminal.  For instance, his first solution
was :
 
"The easy way is to remap standard output to a converter that's not
afraid to send ISO-Latin-1 or UTF-8 characters to the terminal."

But I avoided displaying anything in my original example, because I
didn't want to confuse the issue.  It's also why I didn't mention the
damning FAQ entry:

>>> y = results[1].a.fetchText(re.compile('.+'))

Is all I am trying to do.

I don't expect non-ASCII characters to display correctly, however I
was suprised when I tried "print x" in my original example, and it
printed.  I would have expected to have to do something like:

>>> print x.encode("utf8")
Matt Croydon::Postneo 2.0 » Blog Archive » Mobile Screen Scraping <b>...</b>

I've just looked, and I have to do this explicit encoding under python
 2.3.4, but not under 2.4.1.  So perhaps 2.4 is less afraid/smarter
about converting and displaying non-ascii characters to the terminal. 
Either way, I don't -think- that's my problem with Beautiful Soup.

Changing my default encoding does indeed fix it, but it may be a
reflection of the author making bad assumptions because his default
was set to utf-8.  I'm not really experienced enough to tell what is
going on in his code, but I've been trying. Does seem to defeat the
point of unicode, however.
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to