Given the strange behaviour of the "C" locale on Android, it is no wonder that I see a test failure:
FAIL: test-hard-locale ====================== The initial locale should not be hard! FAIL test-hard-locale (exit status: 1) This patch adjusts the function hard_locale in a conservative way (i.e. hard_locale(category) returns 1 if the locale has UTF-8 encoding or may behave in an unknown way in the future), and updates the unit test accordingly. I don't expect coreutils malfunctions on Android due to this, but we'll see. 2023-01-17 Bruno Haible <br...@clisp.org> hard-locale: Port to Android ≥ 5.0. * lib/hard-locale.c: Include <stdlib.h>. (hard_locale): On Android, consider also MB_CUR_MAX, even if the locale's name is "C". * tests/test-hard-locale.c (test_one, main): Assume that on Android, even the "C" locale is hard. diff --git a/lib/hard-locale.c b/lib/hard-locale.c index 0a28552e75..c01fce5344 100644 --- a/lib/hard-locale.c +++ b/lib/hard-locale.c @@ -21,6 +21,7 @@ #include "hard-locale.h" #include <locale.h> +#include <stdlib.h> #include <string.h> bool @@ -31,5 +32,16 @@ hard_locale (int category) if (setlocale_null_r (category, locale, sizeof (locale))) return false; - return !(strcmp (locale, "C") == 0 || strcmp (locale, "POSIX") == 0); + if (!(strcmp (locale, "C") == 0 || strcmp (locale, "POSIX") == 0)) + return true; + +#if defined __ANDROID__ + /* On Android 5.0 or newer, it is possible to set a locale that has the same + name as the "C" locale but in fact uses UTF-8 encoding. Cf. test case 2 in + <https://lists.gnu.org/archive/html/bug-gnulib/2023-01/msg00141.html>. */ + if (MB_CUR_MAX > 1) + return true; +#endif + + return false; } diff --git a/tests/test-hard-locale.c b/tests/test-hard-locale.c index eb02f4f6e6..6f94e6c3ac 100644 --- a/tests/test-hard-locale.c +++ b/tests/test-hard-locale.c @@ -38,8 +38,10 @@ test_one (const char *name, int failure_bitmask) /* musl libc has special code for the C.UTF-8 locale; other than that, all locale names are accepted and all locales are trivial. OpenBSD returns the locale name that was set, but we don't know how it - behaves under the hood. Likewise for Haiku. */ -#if defined MUSL_LIBC || defined __OpenBSD__ || defined __HAIKU__ + behaves under the hood. Likewise for Haiku. + On Android >= 5.0, the "C" locale may have UTF-8 encoding, and we don't + know how it will behave in the future. */ +#if defined MUSL_LIBC || defined __OpenBSD__ || defined __HAIKU__ || defined __ANDROID__ expected = true; #else expected = !all_trivial; @@ -57,12 +59,14 @@ test_one (const char *name, int failure_bitmask) /* On NetBSD 7.0, some locales such as de_DE.ISO8859-1 and de_DE.UTF-8 have the LC_COLLATE category set to "C". - Similarly, on musl libc, with the C.UTF-8 locale. */ + Similarly, on musl libc, with the C.UTF-8 locale. + On Android >= 5.0, the "C" locale may have UTF-8 encoding, and we don't + know how it will behave in the future. */ #if defined __NetBSD__ expected = false; #elif defined MUSL_LIBC expected = strcmp (name, "C.UTF-8") != 0; -#elif (defined __OpenBSD__ && HAVE_DUPLOCALE) || defined __HAIKU__ /* OpenBSD >= 6.2, Haiku */ +#elif (defined __OpenBSD__ && HAVE_DUPLOCALE) || defined __HAIKU__ || defined __ANDROID__ /* OpenBSD >= 6.2, Haiku, Android */ expected = true; #else expected = !all_trivial; @@ -86,12 +90,16 @@ main () { int fail = 0; - /* The initial locale is the "C" or "POSIX" locale. */ + /* The initial locale is the "C" or "POSIX" locale. + On Android >= 5.0, it is equivalent to the "C.UTF-8" locale, cf. + <https://lists.gnu.org/archive/html/bug-gnulib/2023-01/msg00141.html>. */ +#if ! defined __ANDROID__ if (hard_locale (LC_CTYPE) || hard_locale (LC_COLLATE)) { fprintf (stderr, "The initial locale should not be hard!\n"); fail |= 1; } +#endif all_trivial = (setlocale (LC_ALL, "foobar") != NULL);