Bug#659920: rss2email incorrectly chooses big5 character encoding

James Cloos Sat, 18 Feb 2012 10:45:22 -0800

>>>>> "EM" == Etienne Millon <etienne.mil...@gmail.com> writes:


EM> Can you try to put the following line in your config.py ?

EM> CHARSET_LIST='US-ASCII', 'UTF-8', 'BIG5', 'ISO-2022-JP', 'ISO-8859-1'

Ahh.  It had been so long since I set my current config up, that either
I had forgotten about r2e's config.py or my setup predated CHARSET_LIST...

I used »CHARSET_LIST='US-ASCII', 'ISO-8859-1', 'UTF-8'« because those
will cover all of the feeds which I monitor.

There is little reason to have 8859-* after utf-8; it would never fall
through to it.  But having it ahead of utf-8 can have benefit.

Something in my chain forces r2e's 8859-1 to cte:qp and its utf-8 to
cte:b64.  Given that the former can be read w/o decoding, it is useful
to permit its use.

I don't know where the CJ encodings should fall in the default set.
Having them ahead of utf-8 causes harm for non-asian-language feeds,
but having utf-8 first will unify zh and jp text.  That is, if the
reader's MUA is configured to prefer a jp font for ideographs, then
zh text in utf-8 will be rendered with that jp font.  And visa-versa
if their MUA prefers a zh_{CN,TW,HK} font.  I don't know how much
harm that would do.  I usually can recognize zh_CN vs zh_TW vs jp
vs kn text, but cannot actually read any of them....

Were the character-set matching to limit big5 and 2022 to characters
which are not used outside Asia, though, they could remain before utf-8.
That means that neutral chars -- like the quotes -- should not match
the CJK character sets.  Only matching characters which have width
property W in unicode's EastAsianWidth.txt should do the trick.
(The quotes have A, presumably for Ambiguous.)

Explicitly configuring it, though, does fix things for me.

Thanks.

-JimC
-- 
James Cloos <cl...@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#659920: rss2email incorrectly chooses big5 character encoding

Reply via email to