Eugene V. Lyubimkin wrote: > Utility html2text, version 1.3.2a-6, with "utf8" patch was just > uploaded to experimental. The patch allows to process UTF-8 files > when '-utf8' option supplied. Input should be in UTF-8 and output will > be in UTF-8 too. > > Please test this functionality - I believe that UTF-8 support is a > good feature, especially for processing non-English documents.
Mmm, the way it is done looks wrong to me: there is no reason why the input and output charsets should be related at all. For the input, html2text should recognize the meta http-equiv tag, that should work for a lot of pages, else an input-charset option can be provided. For the output, the current locale's charset should be used (as returned by nl_langinfo(CODESET) after calling setlocale(LC_CTYPE,"")), that should work in almost all cases, else an output-charset option can be provided. Yes, that means conversions. But without that you can not put a sticker "utf-8 support", only "limited utf-8 support". Samuel -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]