Re: [PATCH 02/14] Fix character encoding aliases for OS/2

KO Myung-Hun Wed, 24 Dec 2014 19:59:47 -0800


Daiki Ueno wrote:
> KO Myung-Hun <kom...@gmail.com> writes:
> 
>>> For example, it says:
>>>
>>>   If the output character set is ommited from the LANG variable, the
>>>   default codepage is ALWAYS taken from the operating system (e.g. the
>>>   codepage setting from locale.alias is always ignored, so "russian"
>>>   stays just for "ru_RU" and not for "ru_RU.ISO-8859-5"); you may want
>>>   to set it just if you want to override the active OS/2 codepage.
>>
>> This patch does not change any behaviors of OS/2 port of gettext.
>> Because this patch embeds charset aliases into localcharset.c instead of
>> using charset.alias file, and fixes the problem a locale instead of a
>> charset if a charset is not specified is returned.
> 
> Thanks for checking, looks safe then.  One question is: where did you
> take the mapping data?  I found this chart:
> http://www.borgendale.com/locale.htm
>


I used CODEPAGE and COUNTRY parts of OS/2 command references.

> I'm not sure how authoritative it is, but there are some differences
> from yours: ar_AA, bg_BG, lt_LT, lv_LV, and zh_CN.
> 

Owing to you, I've found that I omitted ar_AA and comments for code page
lists. Thanks.

And the differences of others is because cp915, cp921 and cp1381 are not
supported by libiconv. In these cases, I picked up code pages from
config.charset.

Finally, I changed a code page for bg_BG from cp1251 to cp855 which is
etc code page.

-- 
KO Myung-Hun

Using Mozilla SeaMonkey 2.7.2
Under OS/2 Warp 4 for Korean with FixPak #15
In VirtualBox v4.1.32 on Intel Core i7-3615QM 2.30GHz with 8GB RAM

Korean OS/2 User Community : http://www.ecomstation.co.kr

From c300b50a741bc01352ffdbe71738af6741ee0a95 Mon Sep 17 00:00:00 2001
From: KO Myung-Hun <k...@chollian.net>
Date: Thu, 23 Feb 2012 22:37:21 +0900
Subject: [PATCH] Fix character encoding aliases for OS/2

On OS/2, a charset is not specified generally. For examples, set LANG
just to ko_KR for Korean. So charset-to-charset mapping is not useful
in thise case. Instead use locale-to-charset mapping.

And embed aliases to avoid the troubles finding a separate file like
Windows.

* lib/config.charset: Remove os2* from case "$os" in
* lib/localcharset.c (get_charset_aliases): Use embedded encoding
aliases on OS/2.
---
 lib/config.charset |  4 +---
 lib/localcharset.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/lib/config.charset b/lib/config.charset
index 4e4c7ed..3e6c88f 100644
--- a/lib/config.charset
+++ b/lib/config.charset
@@ -348,12 +348,10 @@ case "$os" in
     #echo "sun_eu_greek ?" # what is this?
     echo "UTF-8 UTF-8"
     ;;
-  freebsd* | os2*)
+  freebsd*)
     # FreeBSD 4.2 doesn't have nl_langinfo(CODESET); therefore
     # localcharset.c falls back to using the full locale name
     # from the environment variables.
-    # Likewise for OS/2. OS/2 has XFree86 just like FreeBSD. Just
-    # reuse FreeBSD's locale data for OS/2.
     echo "C ASCII"
     echo "US-ASCII ASCII"
     for l in la_LN lt_LN; do
diff --git a/lib/localcharset.c b/lib/localcharset.c
index 1c17af0..3a4c87d 100644
--- a/lib/localcharset.c
+++ b/lib/localcharset.c
@@ -128,7 +128,7 @@ get_charset_aliases (void)
   cp = charset_aliases;
   if (cp == NULL)
     {
-#if !(defined DARWIN7 || defined VMS || defined WINDOWS_NATIVE || defined 
__CYGWIN__)
+#if !(defined DARWIN7 || defined VMS || defined WINDOWS_NATIVE || defined 
__CYGWIN__ || defined OS2)
       const char *dir;
       const char *base = "charset.alias";
       char *file_name;
@@ -342,6 +342,70 @@ get_charset_aliases (void)
            "CP54936" "\0" "GB18030" "\0"
            "CP65001" "\0" "UTF-8" "\0";
 # endif
+# if defined OS2
+      /* To avoid the troubles of installing a separate file in the same
+         directory as the DLL and of retrieving the DLL's directory at
+         runtime, simply inline the aliases here.  */
+
+      /* On OS/2, a charset is not specified generally. For examples, set LANG
+         just to ko_KR for Korean. So charset-to-charset mapping is not useful
+         in thise case. Instead use locale-to-charset mapping.  */
+
+                                            /* Pri,  Alt, Etc  */
+      cp = "ar_AA" "\0" "CP864" "\0"        /* 864,  850, 437  */
+           "bg_BG" "\0" "CP855" "\0"        /* 915,  850, 855  */
+           "ca_ES" "\0" "CP850" "\0"        /*                 */
+           "cs_SZ" "\0" "CP852" "\0"        /* 852,  850       */
+           "da_DK" "\0" "CP850" "\0"        /* 850,  865, 1004 */
+           "de_AT" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "de_CH" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "de_DE" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "el_GR" "\0" "CP869" "\0"        /* 869,  850, 813  */
+           "en_AU" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "en_CA" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "en_GB" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "en_IE" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "en_NZ" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "en_US" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "en_ZA" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "es_ES" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "es_LA" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "et_EE" "\0" "CP922" "\0"        /* 922,  850       */
+           "fi_FI" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "fr_BE" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "fr_CA" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "fr_CH" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "fr_FR" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "hr_HR" "\0" "CP852" "\0"        /* 852,  850       */
+           "hu_HU" "\0" "CP852" "\0"        /* 852,  850, 1004 */
+           "is_IS" "\0" "CP850" "\0"        /* 850,  861, 1004 */
+           "it_CH" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "it_IT" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "iw_IL" "\0" "CP862" "\0"        /* 862,  850, 437  */
+           "ja_JP" "\0" "CP943" "\0"        /* 943,  850, 942  */
+           "ko_KR" "\0" "CP949" "\0"        /* 949,  850, 944  */
+           "lt_LT" "\0" "ISO-8859-13" "\0"  /* 921,  850       */
+           "lv_LV" "\0" "ISO-8859-13" "\0"  /* 921,  850       */
+           "mk_MK" "\0" "CP855" "\0"        /* 855,  850, 915  */
+           "nl_BE" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "nl_NL" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "no_NO" "\0" "CP850" "\0"        /* 850,  865, 1004 */
+           "pl_PL" "\0" "CP852" "\0"        /* 852,  850       */
+           "pt_BR" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "pt_PT" "\0" "CP850" "\0"        /* 850,  860, 1004 */
+           "ro_RO" "\0" "CP852" "\0"        /* 852,  850, 1004 */
+           "ru_RU" "\0" "CP866" "\0"        /* 866,  850, 915  */
+           "sh_BA" "\0" "CP852" "\0"        /* 852,  850       */
+           "sk_SK" "\0" "CP852" "\0"        /* 852,  850       */
+           "sl_SI" "\0" "CP852" "\0"        /* 852,  850       */
+           "sq_AL" "\0" "CP850" "\0"        /* 850,  437       */
+           "sr_SP" "\0" "CP855" "\0"        /* 855,  850, 915  */
+           "sv_SE" "\0" "CP850" "\0"        /* 850,  437, 1004 */
+           "th_TH" "\0" "CP874" "\0"        /* 874,  850       */
+           "tr_TR" "\0" "CP857" "\0"        /* 857,  850, 1004 */
+           "zh_CN" "\0" "GB2312" "\0"       /* 1381, 850, 946  */
+           "zh_TW" "\0" "CP950" "\0";       /* 950,  850, 948  */
+# endif
 #endif
 
       charset_aliases = cp;
-- 
1.8.5.2

Re: [PATCH 02/14] Fix character encoding aliases for OS/2

Reply via email to