-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm continuting to try and run webcheck on the debian website;  it now
fails with a crash in beautifulsoup:

webcheck:   http://www.slf.ch/
[...]
 File "/usr/share/webcheck/parsers/html/beautifulsoup.py", line 60, in parse
   base = myurllib.normalizeurl(htmlunescape(base['href']).strip())
 File "/var/lib/python-support/python2.4/BeautifulSoup.py", line 419, in 
__getitem__
   return self._getAttrMap()[key]
KeyError: 'href'

This occurs after running it on http://www.nl.debian.org for a while;
continuing with webcheck -c does work though, and webcheck doesn't
crash then anymore...
[...]
Versions of packages webcheck recommends:
ii  python-beautifulsoup          3.0.1-2    error-tolerant HTML parser for Pyt

This is a bug in BeautifulSoup in the version in etch (3.0.1-2 has the problem, 3.0.4-1 does not). Maybe I should change to a versioned Recommends (or maybe a Conflicts with older versions).

I could include some workaround code but I don't think that is worth the effort. This would simplify backports.

Anyway, the problem is that the base tag is expected to have an href attribute (the used find is supposed

By the way, if you're crawling a very big website (like Debian's) I would highly recommend using Python 2.5 instead of 2.4. Python 2.5 has much better performing sets.

- -- - -- arthur - [EMAIL PROTECTED] - http://people.debian.org/~adejong --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGnNtDVYan35+NCKcRAhICAJ9zdpi+9cJQxlZu+1QZehnLLlg1NgCgxBgC
1OTmUlqfoZU5YMN8lY+LgDg=
=7xdl
-----END PGP SIGNATURE-----


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to