Paul Eggert wrote:
> I'd remove "or GB18030"; it's not that popular due to the fact that it's 
> not ASCII-safe like UTF-8 is, its popularity is limited primarily to one 
> country (admittedly a large one, but still), and even in China it's much 
> less popular than UTF-8, judging at least from what's published on the 
> Web. w3techs says that less than 0.003% of the world's websites use 
> GB18030, which is less than even the 0.11% of websites that use its 
> predecessor GB2312 (and is waaaay less than the 97.9% of websites that 
> use UTF-8). Although there's no way to peer into Unix installations in 
> China, I'd be surprised if GB18030 is all that popular on the world 
> stage, even counting its use within China.

I don't think the percentage of GB18030 encoded texts on the web and the
percentage of computers using GB18030 as a locale encoding are necessarily
related. Publishing a text on the web is typically done through a program
that has a "Export as HTML" action, and you can assume that this action
will most often convert to UTF-8.

I was under the impression that GB18030 was used in China, because
  - Wikipedia [1] says "Since 1 May 2006, support for the mandatory subset
    is officially required for all software products sold in the PRC."
  - I heard some years ago that "If you want to sell software to a
    governmental institution in China, it must support GB18030",
  - Some software companies have/had integrated this requirement in their
    QA processes.
  - There was a new revision of GB18030 in 2022 [1].

But now I did a survey, what locale gets used when user installs an
enterprise Linux distro with "Simplified Chinese" as installation
language, or a Chinese Linux distro outright. The result is:

  * All of these distros put the user in a UTF-8 locale.

  * None of them has even an option(!) to put the user in a GB18030 locale.

In detail:

  * For enterprise Linux distros, I chose

    + Alma Linux 9.0 (RHEL 9.0 clone).
      - If, at installation time, I pick "Simplified Chinese (China)",
        after the installation, the environment variables are:
          LANG=zh_CN.UTF-8
      - If, at installation time, I pick "Traditional Chinese (Taiwan)",
        after the installation, the environment variables are:
          LANG=zh_TW.UTF-8

    + openSUSE 15.4 (which should be close to SLES 15.4).
      After installation, one can change the "primary language" [2]
      at YaST > System > Language.
      - Choosing "Chinese Simplified" offers a checkbox "Use UTF-8 Encoding"
        that is on by default. When I turn it off, after the installation, the
        environment variables are:
          LANG=en_US.UTF-8, LC_CTYPE=zh_CN.
        The effective encoding is EUC-CN (or GBK?), not GB18030.
      - Choosing "Chinese Traditional" offers a checkbox "Use UTF-8 Encoding"
        that is on by default. When I turn it off, after the installation, the
        environment variables are:
          LANG=en_US.UTF-8, LC_CTYPE=zh_TW
        The effective encoding is CP950, not Big5 or Big5-2003.
      In both cases this combination of LANG and LC_CTYPE is not supported by
      glibc (because of the LC_COLLATE and other locale categories).

  * On distrowatch.com, I found two Chinese Linux distros.

    + deepin
      Here, after installation, the environment variables are:
        LANG=zh_CN.UTF-8, LANGUAGE=zh_CN

    + Ubuntu Kylin
      Here, after installation, the environment variables are:
        LANG=zh_CN.UTF-8, LANGUAGE=zh_CN:

  * Then, there is also IBM AIX. What Chinese locales does it have for zh_CN?
    On AIX 7.2, a GB18030 locale is officially supported [3]. But running 
"locale -a"
    on an AIX 7.2 machine reveals that it has
      zh_Hans_CN.UTF-8
      zh_Hans_SG.UTF-8
      zh_Hant_HK.UTF-8
      zh_Hant_TW.UTF-8
    but no locale with GB18030 encoding.

So, it seems that GB18030 is no longer important, even on the Chinese market.

> Besides, we're better off not taking a stand in the GB18030 vs Big5 vs 
> other-national-encoding controversies.

I don't see any controversies there. GB18030 was intended for users who
choose "Simplified Chinese", and Big5 was intended for users who choose
"Traditional Chinese". Big5 and its standardized variant Big5-2003 were
apparently obsoleted by Microsoft's CP950. And now, as a locale encoding,
UTF-8 is predominantly used for both groups of users.

There may be a controversy regarding whether "Simplified Chinese" or
"Traditional Chinese" is the proper choice in territories like Hong Kong
or Macao. But I have no concrete data on this, and it is irrelevant here.

Bruno

[1] https://en.wikipedia.org/wiki/GB_18030
[2] 
https://doc.opensuse.org/documentation/leap/startup/html/book-startup/cha-yast-lang.html
[3] 
https://www.ibm.com/docs/en/aix/7.2?topic=globalization-supported-languages-locales




Reply via email to