Paul Eggert wrote: > > * The locale encoding is BIG5-HKSCS, e.g. on a glibc system the > > zh_HK.BIG5-HKSCS the locale. > > > > * The input is one of the 4 characters in that encoding that map to > > a sequence of two Unicode characters: > > > > input maps to > > ----- ------- > > 0x88 0x62 U+00CA U+0304 > > 0x88 0x64 U+00CA U+030C > > 0x88 0xA3 U+00EA U+0304 > > 0x88 0xA5 U+00EA U+030C > ... > > I looked into this some more and unfortunately don't understand the > above. Could you explain a bit more?
My statement is about BIG5-HKSCS. > <http://www.nits.org.cn/index/article/4034> says that the official > mapping table for GB 18030-2022 and BMP is here: > > http://www.nits.org.cn/cmsfile/download/134 > > and this contains the following (nonconsecutive) lines: > > 5746 8862 > 5749 8864 > 57BC 88A3 > 57BE 88A5 > > which, if I understand things correctly, means the four two-byte > sequences that you mention should convert to the following four Unicode > characters: > > 坆 U+5746 CJK IDEOGRAPH-5746 > 坉 U+5749 CJK IDEOGRAPH-5749 > 垼 U+57BC CJK IDEOGRAPH-57BC > 垾 U+57BE CJK IDEOGRAPH-57BE > > without mbrtoc23 having to return (size_t) -3. > > Perhaps there was a problem with an earlier version of GB 18030 that has > been fixed in the 2022 edition? You are looking at GB18030. GB18030 and BIG5-HKSCS are completely unrelated. References: https://www.haible.de/bruno/charsets/conversion-tables/Chinese.html https://en.wikipedia.org/wiki/Big5#HKSCS https://en.wikipedia.org/wiki/GB_18030 Bruno