Package: html2text
Version: 1.3.2a-15
Severity: normal

Dear Maintainer,

The following simple html file causes an "input recoding" failure:

$ cat sample.htm
<html><body>
<table BORDER="1">
<tr><td>&nbsp;</td></tr>
</table>
</body></html>
$ html2text sample.htm
Input recoding failed due to invalid input sequence. Unconverted part of
text follows.
�|

$

Removing or replacing the non-breakable space or setting the border to 0
allows html2text to process the file correctly.
Placing a character (or multiple characters) after the non-breakable space
also allows html2text to process the file correctly, although the first
character after the non-breakable space is not displayed.

I was able to replicate the failure on Squeeze (so it's not a new bug).

-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages html2text depends on:
ii  libc6       2.13-35
ii  libgcc1     1:4.7.1-7
ii  libstdc++6  4.7.1-7

html2text recommends no packages.

Versions of packages html2text suggests:
ii  curl  7.26.0-1
ii  wget  1.13.4-3

-- no debconf information

Reply via email to