Bug#414314: [Alpine-alpha] Re: Bug#414314: alpine: please include iso-8859-2 in the default posting-character-set

Mark Crispin Thu, 15 Mar 2007 14:13:37 -0800

Thanks for letting us know about this issue.

Asheesh's analysis is correct. Without linguistic analysis (which isprobably more than we want to get into!) or user guidance, guessing thecorrect character set is a matter of trial and error of "which one ofthese character sets can do a lossless conversion of the Unicode sourcetext."

Alpine offers the ability to provide user guidance by setting theposting-character-set to the user's preferred character set.

That list for trial and error is arbitrary, and is not intended to snubanyone. US-ASCII at one end, and UTF-8 at the other end are bothno-brainers. ISO-8859-15 is for our Euro users, ISO-8859-1 is for legacy(note that most non-Unicode 8-bit stuff in North America is 8859-1), andISO-2022-JP and KOI8-R are for the long-time Japanese and Russian usercommunities.

We probably could add to that list if there is a sufficient constituency;but as Asheesh says, the overlap between some ISO 8859 variants is suchthat a "wrong" albeit encoding can still be chosen.

It's difficult to justify adding yet another Latin script variant charsetwhile neglecting entire non-Latin scripts e.g., Arabic, Chinese(simplified and traditional), Greek, Hebrew, Korean, Thai, etc. However,the more that is added to the list, the slower the overall process foreveryone, especially with the larger scripts.

There is one other possibility; we could have some mechanism to make anote of original source charset and try that charset specially. Thiswould be similar to how Pine worked (but hopefully without all theincredible complexity and kluginess that the old Pine i18n code had!).Determining widths of "ambiguous" class characters would also be aided bysuch a mechanism. But we're talking possible futures here, and there's noguarantee that we'll do it. [I'm not the person to convince.]

In my personal opinion, "posting-character-set=UTF-8" is the only rightsetting; thus messages are either US-ASCII or UTF-8. However, practicalconsiderations dictate otherwise for now, because some people will flameif they receive a message in UTF-8 rather than the local character set.That's why "posting-character-set=ISO-8859-1" in my own configuration, soI can't criticize others without facing a "pot, kettle, black" problem!

We are, however, interested in receiving feedback on this general issue.If a clear concensus evolves we would certainly consider it. Thanks!


-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#414314: [Alpine-alpha] Re: Bug#414314: alpine: please include iso-8859-2 in the default posting-character-set

Reply via email to