package html2text tags 545695 + confirmed wontfix thanks Hello Jens,
Jens Seidel wrote: > Package: html2text > Version: 1.3.2a-14 > Severity: normal > > Hi, > [...] > As you can see the problem is the vertical column separator | which > probably interrupts two bytes of the last multibyte character and makes > the file not UTF-8 conform. Probably. Html2text definitely lacks the proper multibyte support. Many parts of the core don't know anything about encodings. > I assumed it should be easy to reproduce but failed with another error: > > $ html2text -width 10 test.html > Input recoding failed due to invalid input sequence. Unconverted part of text > follows. > #|ü > |öäü ö#|ü > |öäü ö#|ü > |öäü ö#|ü > |öäü ö#|ü > |öäü ö#|ü > |öäü____| > > This error is wrong. test.html is a proper HTML file in latin1 encoding! Indeed. However, even after some kind of fix UTF-8 and tables seems to be incompatible each with other in current html2text. This bug is 'wontfix' for me. I recommend trying other converters for batch processings, such as 'vilistextum' or 'lynx --dump'. -- Eugene V. Lyubimkin aka JackYF, JID: jackyf.devel(maildog)gmail.com C++/Perl developer, Debian Developer
signature.asc
Description: OpenPGP digital signature