Re: doc: New chapter "Strings and Characters"

Paul Eggert Mon, 15 May 2023 19:14:43 -0700

Thanks for writing that. A quick review:

Em dash (---) shouldn't have spaces around it; if you like those spaces(which I kind of do) please use en dash (--).

+It is important to realize that the majority of Unix installations
+nowadays use UTF-8 or GB18030 as locale encoding; therefore, the
+majority of users are using multibyte locales.

I'd remove "or GB18030"; it's not that popular due to the fact that it'snot ASCII-safe like UTF-8 is, its popularity is limited primarily to onecountry (admittedly a large one, but still), and even in China it's muchless popular than UTF-8, judging at least from what's published on theWeb. w3techs says that less than 0.003% of the world's websites useGB18030, which is less than even the 0.11% of websites that use itspredecessor GB2312 (and is waaaay less than the 97.9% of websites thatuse UTF-8). Although there's no way to peer into Unix installations inChina, I'd be surprised if GB18030 is all that popular on the worldstage, even counting its use within China.

Besides, we're better off not taking a stand in the GB18030 vs Big5 vsother-national-encoding controversies. It's fine to use GB18030 as anexample, but the current wording makes it look like the world is mostlyjust choosing between UTF-8 vs GB18030, which I doubt was intended.

+The @posixheader{ctype.h} API, that was designed only with unibyte
+encodings in mind, is useless nowadays; it does not work in
+multibyte locales.

It's still useful, even in multibyte locales, when dealing with datathat is inherently unibyte. Perhaps prepend "for general textprocessing" to the sentence. Similarly for the later occurrence of"useless and obsolete".

+While UTF-8 is the most common multibyte encoding, GB18030 is there as
+well and will not go away within decades, because it is a Chinese
+government standard, last revised in 2022.

Again, let's not focus on GB18030 to the exclusion of other nationalencodings.

+For complex string processing, the provided strings functions may not be


strings -> string

Re: doc: New chapter "Strings and Characters"

Reply via email to