Paul Eggert wrote:
> >    * The locale encoding is BIG5-HKSCS, e.g. on a glibc system the
> >      zh_HK.BIG5-HKSCS the locale.
> > 
> >    * The input is one of the 4 characters in that encoding that map to
> >      a sequence of two Unicode characters:
> > 
> >         input         maps to
> >         -----         -------
> >       0x88 0x62    U+00CA U+0304
> >       0x88 0x64    U+00CA U+030C
> >       0x88 0xA3    U+00EA U+0304
> >       0x88 0xA5    U+00EA U+030C >       ...
> 
> I looked into this some more and unfortunately don't understand the 
> above. Could you explain a bit more?

My statement is about BIG5-HKSCS.

> <http://www.nits.org.cn/index/article/4034> says that the official 
> mapping table for GB 18030-2022 and BMP is here:
> 
> http://www.nits.org.cn/cmsfile/download/134
> 
> and this contains the following (nonconsecutive) lines:
> 
>    5746       8862
>    5749       8864
>    57BC       88A3
>    57BE       88A5
> 
> which, if I understand things correctly, means the four two-byte 
> sequences that you mention should convert to the following four Unicode 
> characters:
> 
>    坆 U+5746 CJK IDEOGRAPH-5746
>    坉 U+5749 CJK IDEOGRAPH-5749
>    垼 U+57BC CJK IDEOGRAPH-57BC
>    垾 U+57BE CJK IDEOGRAPH-57BE
> 
> without mbrtoc23 having to return (size_t) -3.
> 
> Perhaps there was a problem with an earlier version of GB 18030 that has 
> been fixed in the 2022 edition?

You are looking at GB18030. GB18030 and BIG5-HKSCS are completely
unrelated.

References:
https://www.haible.de/bruno/charsets/conversion-tables/Chinese.html
https://en.wikipedia.org/wiki/Big5#HKSCS
https://en.wikipedia.org/wiki/GB_18030

Bruno




Reply via email to