grouchy wrote: > Hi, > > I'm having bang-my-head-against-a-wall moments trying to figure all of this > out. > >>>>from BeautifulSoup import BeautifulSoup >>> >>>>file = urllib.urlopen("http://www.google.com/search?q=beautifulsoup") >>>>file = file.read().decode("utf-8") >>>>soup = BeautifulSoup(file) >>>>results = soup('p','g') >>>>x = results[1].a.renderContents() >>>>type(x) > > <type 'unicode'> > >>>>print x > > Matt Croydon::Postneo 2.0 » Blog Archive » Mobile Screen Scraping <b>...</b> > > So far so good. But what I really want is just the text, so I try > something like: > > >>>>y = results[1].a.fetchText(re.compile('.+')) > > Traceback (most recent call last): > File "<interactive input>", line 1, in ? > File "BeautifulSoup.py", line 466, in fetchText > return self.fetch(recursive=recursive, text=text, limit=limit) > File "BeautifulSoup.py", line 492, in fetch > return self._fetch(name, attrs, text, limit, generator) > File "BeautifulSoup.py", line 194, in _fetch > if self._matches(i, text): > File "BeautifulSoup.py", line 252, in _matches > chunk = str(chunk) > UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in > position 26: ordinal not in range(128) > > Is this a bug? Come to think of it, I'm not even sure how printing x > worked, since it printed non-ascii characters.
This is the first question in the BeautifulSoup FAQ at http://www.crummy.com/software/BeautifulSoup/FAQ.html Unfortunately the author of BS considers this a problem with your Python installation! So it seems he doesn't have a good understanding of Python and Unicode. (OK, I can forgive him that, I think there are only a handful of people who really do understand it completely.) The first fix given doesn't work. The second fix works but it is not a good idea to change the default encoding for your Python install. There is a hack you can use to change the default encoding just for one program; in your program put reload(sys); sys.setdefaultencoding('utf-8') This seems to fix the problem you are having. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor