Paul Eggert wrote: > I'd remove "or GB18030"; it's not that popular due to the fact that it's > not ASCII-safe like UTF-8 is, its popularity is limited primarily to one > country (admittedly a large one, but still), and even in China it's much > less popular than UTF-8, judging at least from what's published on the > Web. w3techs says that less than 0.003% of the world's websites use > GB18030, which is less than even the 0.11% of websites that use its > predecessor GB2312 (and is waaaay less than the 97.9% of websites that > use UTF-8). Although there's no way to peer into Unix installations in > China, I'd be surprised if GB18030 is all that popular on the world > stage, even counting its use within China.
I don't think the percentage of GB18030 encoded texts on the web and the percentage of computers using GB18030 as a locale encoding are necessarily related. Publishing a text on the web is typically done through a program that has a "Export as HTML" action, and you can assume that this action will most often convert to UTF-8. I was under the impression that GB18030 was used in China, because - Wikipedia [1] says "Since 1 May 2006, support for the mandatory subset is officially required for all software products sold in the PRC." - I heard some years ago that "If you want to sell software to a governmental institution in China, it must support GB18030", - Some software companies have/had integrated this requirement in their QA processes. - There was a new revision of GB18030 in 2022 [1]. But now I did a survey, what locale gets used when user installs an enterprise Linux distro with "Simplified Chinese" as installation language, or a Chinese Linux distro outright. The result is: * All of these distros put the user in a UTF-8 locale. * None of them has even an option(!) to put the user in a GB18030 locale. In detail: * For enterprise Linux distros, I chose + Alma Linux 9.0 (RHEL 9.0 clone). - If, at installation time, I pick "Simplified Chinese (China)", after the installation, the environment variables are: LANG=zh_CN.UTF-8 - If, at installation time, I pick "Traditional Chinese (Taiwan)", after the installation, the environment variables are: LANG=zh_TW.UTF-8 + openSUSE 15.4 (which should be close to SLES 15.4). After installation, one can change the "primary language" [2] at YaST > System > Language. - Choosing "Chinese Simplified" offers a checkbox "Use UTF-8 Encoding" that is on by default. When I turn it off, after the installation, the environment variables are: LANG=en_US.UTF-8, LC_CTYPE=zh_CN. The effective encoding is EUC-CN (or GBK?), not GB18030. - Choosing "Chinese Traditional" offers a checkbox "Use UTF-8 Encoding" that is on by default. When I turn it off, after the installation, the environment variables are: LANG=en_US.UTF-8, LC_CTYPE=zh_TW The effective encoding is CP950, not Big5 or Big5-2003. In both cases this combination of LANG and LC_CTYPE is not supported by glibc (because of the LC_COLLATE and other locale categories). * On distrowatch.com, I found two Chinese Linux distros. + deepin Here, after installation, the environment variables are: LANG=zh_CN.UTF-8, LANGUAGE=zh_CN + Ubuntu Kylin Here, after installation, the environment variables are: LANG=zh_CN.UTF-8, LANGUAGE=zh_CN: * Then, there is also IBM AIX. What Chinese locales does it have for zh_CN? On AIX 7.2, a GB18030 locale is officially supported [3]. But running "locale -a" on an AIX 7.2 machine reveals that it has zh_Hans_CN.UTF-8 zh_Hans_SG.UTF-8 zh_Hant_HK.UTF-8 zh_Hant_TW.UTF-8 but no locale with GB18030 encoding. So, it seems that GB18030 is no longer important, even on the Chinese market. > Besides, we're better off not taking a stand in the GB18030 vs Big5 vs > other-national-encoding controversies. I don't see any controversies there. GB18030 was intended for users who choose "Simplified Chinese", and Big5 was intended for users who choose "Traditional Chinese". Big5 and its standardized variant Big5-2003 were apparently obsoleted by Microsoft's CP950. And now, as a locale encoding, UTF-8 is predominantly used for both groups of users. There may be a controversy regarding whether "Simplified Chinese" or "Traditional Chinese" is the proper choice in territories like Hong Kong or Macao. But I have no concrete data on this, and it is irrelevant here. Bruno [1] https://en.wikipedia.org/wiki/GB_18030 [2] https://doc.opensuse.org/documentation/leap/startup/html/book-startup/cha-yast-lang.html [3] https://www.ibm.com/docs/en/aix/7.2?topic=globalization-supported-languages-locales