package html2text
tags 545695 + confirmed wontfix
thanks

Hello Jens,

Jens Seidel wrote:
> Package: html2text
> Version: 1.3.2a-14
> Severity: normal
> 
> Hi,
> 
[...]
> As you can see the problem is the vertical column separator | which
> probably interrupts two bytes of the last multibyte character and makes
> the file not UTF-8 conform.
Probably. Html2text definitely lacks the proper multibyte support. Many parts
of the core don't know anything about encodings.

> I assumed it should be easy to reproduce but failed with another error:
> 
> $ html2text -width 10 test.html
> Input recoding failed due to invalid input sequence. Unconverted part of text 
> follows.
> #|ü
> |öäü ö#|ü
> |öäü ö#|ü
> |öäü ö#|ü
> |öäü ö#|ü
> |öäü ö#|ü
> |öäü____|
> 
> This error is wrong. test.html is a proper HTML file in latin1 encoding!
Indeed. However, even after some kind of fix UTF-8 and tables seems to be
incompatible each with other in current html2text. This bug is 'wontfix' for me.

I recommend trying other converters for batch processings, such as
'vilistextum' or 'lynx --dump'.

-- 
Eugene V. Lyubimkin aka JackYF, JID: jackyf.devel(maildog)gmail.com
C++/Perl developer, Debian Developer

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to