Package: python-beautifulsoup Version: 3.0.4-1 Severity: normal BeautifulSoup seems to use the content-type correctly to parse entities in the text of an HTML string, but not when they occur inside attribute strings.
The following program produces the error: #---- cut here ---- #!/usr/bin/python from BeautifulSoup import BeautifulSoup input1 = ''' <html> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <h1>Here is a Latin-1 entity: ®</h1> </html> ''' print BeautifulSoup(input1) input2 = ''' <html> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="Description" content="Here is a Latin-1 entity: ®" /> </html> ''' print BeautifulSoup(input2) #---- cut here ---- Here's what it produces on my system: <html> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <h1>Here is a Latin-1 entity: ®</h1> </html> Traceback (most recent call last): File "./bug.py", line 21, in <module> print BeautifulSoup(input2) File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1282, in __init__ BeautifulStoneSoup.__init__(self, *args, **kwargs) File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 946, in __init__ self._feed() File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 971, in _feed SGMLParser.feed(self, markup) File "/usr/lib/python2.5/sgmllib.py", line 99, in feed self.goahead(0) File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead k = self.parse_starttag(i) File "/usr/lib/python2.5/sgmllib.py", line 291, in parse_starttag self.finish_starttag(tag, attrs) File "/usr/lib/python2.5/sgmllib.py", line 340, in finish_starttag self.handle_starttag(tag, method, attrs) File "/usr/lib/python2.5/sgmllib.py", line 376, in handle_starttag method(attrs) File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1372, in start_meta self._feed(self.declaredHTMLEncoding) File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 971, in _feed SGMLParser.feed(self, markup) File "/usr/lib/python2.5/sgmllib.py", line 99, in feed self.goahead(0) File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead k = self.parse_starttag(i) File "/usr/lib/python2.5/sgmllib.py", line 285, in parse_starttag self._convert_ref, attrvalue) UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal not in range(128) -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (990, 'testing'), (500, 'stable'), (400, 'unstable'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.24-1-686 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages python-beautifulsoup depends on: ii python 2.5.2-1 An interactive high-level object-o ii python-support 0.7.7 automated rebuilding support for P python-beautifulsoup recommends no packages. -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]