Package: urlwatch
Version: 1.11-1
Severity: normal
--- Please enter the report below this line. ---
I use the attached hooks.py script to format all html data from the
watched URLs as text.
URLwatch is run as a cron job:
#Checks URLs for changes -- see ~/.urlwatch/urls.txt
5,55 * * * * [my username] urlwatch
When there are no changes to the watched URLs, I get an email with the
following error message:
Traceback (most recent call last):
File "/usr/bin/urlwatch", line 232, in <module>
data = job.retrieve(timestamp, filter, headers)
File "/usr/share/urlwatch/urlwatch/handler.py", line 111, in retrieve
content_unicode = content.decode(encoding, 'ignore')
LookupError: unknown encoding:
Thank you for your help.
--- System information. ---
Architecture: amd64
Kernel: Linux 3.2.0-4-amd64
Debian Release: 7.0
500 testing security.debian.org
500 testing mirror.csclub.uwaterloo.ca
500 testing debian.osuosl.org
--- Package information. ---
Depends (Version) | Installed
==============================-+-============
python (>= 2.4) | 2.7.3~rc2-1
python-support (>= 0.90.0) | 1.0.15
Recommends (Version) | Installed
==============================-+-===========
python-vobject | 0.8.1c-4
python-utidylib | 0.2-8
lynx | 2.8.8dev.12-2
Suggests (Version) | Installed
========================-+-===========
html2text |
#
# Hooks file for urlwatch
#
# Adapted from the example file provided in
# /usr/share/doc/urlwatch/hooks.py.example by urlwatch 1.11-1
# Needed for regular expression substitutions
# import re
# Additional modules installed with urlwatch
# from urlwatch import ical2txt
from urlwatch import html2txt
def filter(url, data):
return html2txt.html2text(data, method='lynx')