In 2004, Paul opened this glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19932
Meanwhile, POSIX:2018 has clarified that conversion from a single 'char' to a wchar_t in the "C" locale always succeeds. In Gnulib, a workaround for this issue was added to the 'mbrtowc' module in 2016. But this is not the only function affected by this issue. I'm adding tests for the behaviour in the "C" locale of the functions btowc mbrlen mbsrtowcs mbsnrtowcs mbstowcs and I see these test failures: test-btowc.c:76: assertion 'wc == c || wc == 0xDF00 + c' failed Aborted FAIL test-btowc3.sh (exit status: 1) test-mbsrtowcs.c:327: assertion 'ret == 1' failed Aborted FAIL test-mbsrtowcs5.sh (exit status: 1) test-mbsnrtowcs.c:327: assertion 'ret == 1' failed Aborted FAIL test-mbsnrtowcs5.sh (exit status: 1) test-mbstowcs.c:226: assertion 'ret == 1' failed Aborted FAIL test-mbstowcs5.sh (exit status: 1) So, these 4 functions need a workaround as well. While testing it, I found another bug, only for the "C" locale, in mingw: In mingw, mbrtowc converts (char) 0x80 to (wchar_t) 0x80, but btowc converts (char) 0x80 to (wchar_t) 0x20ac (at least in an installation with windows-1252 default system encoding). The attached patches fix all of this. 2023-03-30 Bruno Haible <br...@clisp.org> mbstowcs: Add tests. * tests/test-mbstowcs1.sh: New file, based on tests/test-mbsrtowcs1.sh. * tests/test-mbstowcs2.sh: New file, based on tests/test-mbsrtowcs2.sh. * tests/test-mbstowcs3.sh: New file, based on tests/test-mbsrtowcs3.sh. * tests/test-mbstowcs4.sh: New file, based on tests/test-mbsrtowcs4.sh. * tests/test-mbstowcs5.sh: New file, based on tests/test-mbsrtowcs5.sh. * tests/test-mbstowcs.c: New file, based on tests/test-mbsrtowcs.c. * modules/mbstowcs-tests: New file, based on modules/mbsrtowcs-tests. mbstowcs: New module. * lib/stdlib.in.h (mbstowcs): New declaration. * lib/mbstowcs.c: New file, based on lib/mbstoc32s.c. * m4/mbstowcs.m4: New file. * m4/stdlib_h.m4 (gl_STDLIB_H): Test whether mbstowcs is declared. (gl_STDLIB_H_REQUIRE_DEFAULTS): Initialize GNULIB_MBSTOWCS. (gl_STDLIB_H_DEFAULTS): Initialize REPLACE_MBSTOWCS. * modules/stdlib (Makefile.am): Substitute GNULIB_MBSTOWCS, REPLACE_MBSTOWCS. * modules/mbstowcs: New file. * tests/test-stdlib-c++.cc (mbstowcs): Check signature. * doc/posix-functions/mbstowcs.texi: Mention the C locale behaviour bug and the new module. 2023-03-30 Bruno Haible <br...@clisp.org> mbsnrtowcs: Fix behaviour in the C locale. * m4/mbsnrtowcs.m4 (gl_FUNC_MBSNRTOWCS): Invoke gl_MBRTOWC_C_LOCALE. If mbrtowc is buggy in the C locale, override also mbsnrtowcs. * modules/mbsnrtowcs (Files): Add m4/mbrtowc.m4. * tests/test-mbsnrtowcs.c (main): Add a test of the C locale, based on tests/test-mbsrtowcs.c. * tests/test-mbsnrtowcs5.sh: New file, based on tests/test-mbrtowc5.sh. * modules/mbsnrtowcs-tests (Files): Add it. (Makefile.am): Test it. * doc/posix-functions/mbsnrtowcs.texi: Mention the C locale behaviour bug. 2023-03-30 Bruno Haible <br...@clisp.org> mbsrtowcs: Fix behaviour in the C locale. * m4/mbsrtowcs.m4 (gl_FUNC_MBSRTOWCS): Invoke gl_MBRTOWC_C_LOCALE. If mbrtowc is buggy in the C locale, override also mbsrtowcs. * modules/mbsrtowcs (Files): Add m4/mbrtowc.m4. * tests/test-mbsrtowcs.c (main): Add a test of the C locale, based on tests/test-mbrtowc.c. * tests/test-mbsrtowcs5.sh: New file, based on tests/test-mbrtowc5.sh. * modules/mbsrtowcs-tests (Files): Add it. (Makefile.am): Test it. * doc/posix-functions/mbsrtowcs.texi: Mention the C locale behaviour bug. 2023-03-30 Bruno Haible <br...@clisp.org> mbrlen: Add tests. * tests/test-mbrlen1.sh: New file, based on tests/test-mbrtowc1.sh. * tests/test-mbrlen2.sh: New file, based on tests/test-mbrtowc2.sh. * tests/test-mbrlen3.sh: New file, based on tests/test-mbrtowc3.sh. * tests/test-mbrlen4.sh: New file, based on tests/test-mbrtowc4.sh. * tests/test-mbrlen5.sh: New file, based on tests/test-mbrtowc5.sh. * tests/test-mbrlen.c: New file, based on tests/test-mbrtowc.c. * tests/test-mbrlen-w32-1.sh: New file, based on tests/test-mbrtowc-w32-1.sh. * tests/test-mbrlen-w32-2.sh: New file, based on tests/test-mbrtowc-w32-2.sh. * tests/test-mbrlen-w32-3.sh: New file, based on tests/test-mbrtowc-w32-3.sh. * tests/test-mbrlen-w32-4.sh: New file, based on tests/test-mbrtowc-w32-4.sh. * tests/test-mbrlen-w32-5.sh: New file, based on tests/test-mbrtowc-w32-5.sh. * tests/test-mbrlen-w32-6.sh: New file, based on tests/test-mbrtowc-w32-6.sh. * tests/test-mbrlen-w32-7.sh: New file, based on tests/test-mbrtowc-w32-7.sh. * tests/test-mbrlen-w32.c: New file, based on tests/test-mbrtowc-w32.c. * modules/mbrlen-tests: New file, based on modules/mbrtowc-tests. * doc/posix-functions/mbrlen.texi: Update. 2023-03-30 Bruno Haible <br...@clisp.org> btowc: Fix behaviour in the C locale. * lib/btowc.c: Include <string.h> (btowc): Use mbrtowc instead of mbtowc when possible. * m4/btowc.m4 (gl_FUNC_BTOWC): Test for the mingw bug in the C locale. Invoke gl_MBRTOWC_C_LOCALE. If mbrtowc is buggy in the C locale, override also btowc. (gl_PREREQ_BTOWC): Test whether mbrtowc exists. * modules/btowc (Files): Add m4/mbrtowc.m4. (Depends-on): Add mbrtowc. * tests/test-btowc.c (main): Add a test of the C locale, based on tests/test-mbrtowc.c. * tests/test-btowc3.sh: New file, based on tests/test-mbrtowc5.sh. * modules/btowc-tests (Files): Add it. (Makefile.am): Test it. * doc/posix-functions/btowc.texi: Mention the two C locale behaviour bugs and that they are worked around. 2023-03-30 Bruno Haible <br...@clisp.org> mbrtowc tests: Add comment. * tests/test-mbrtowc.c: Add comment. * tests/test-mbrtowc5.sh: Use symmetric coding style. * doc/posix-functions/mbrtowc.texi: Update. 2023-03-30 Bruno Haible <br...@clisp.org> stdlib tests: Check behaviour of C locale. * tests/test-stdlib.c (main): Check MB_CUR_MAX.
From 4993fb36f8c9d1a18b33285a941ae3ed2aa54c59 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 30 Mar 2023 12:25:05 +0200 Subject: [PATCH 1/8] stdlib tests: Check behaviour of C locale. * tests/test-stdlib.c (main): Check MB_CUR_MAX. --- ChangeLog | 5 +++++ tests/test-stdlib.c | 13 ++++++++++++- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/ChangeLog b/ChangeLog index fd39860f9b..ba75036c75 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2023-03-30 Bruno Haible <br...@clisp.org> + + stdlib tests: Check behaviour of C locale. + * tests/test-stdlib.c (main): Check MB_CUR_MAX. + 2023-03-30 Bruno Haible <br...@clisp.org> string-desc tests: Fix "make distcheck" failure. diff --git a/tests/test-stdlib.c b/tests/test-stdlib.c index ceccd4d233..0d3984701a 100644 --- a/tests/test-stdlib.c +++ b/tests/test-stdlib.c @@ -45,8 +45,19 @@ static_assert (sizeof NULL == sizeof (void *)); int main (void) { - if (test_sys_wait_macros ()) + /* POSIX:2018 says: + "In the POSIX locale the value of MB_CUR_MAX shall be 1." */ + /* On Android ≥ 5.0, the default locale is the "C.UTF-8" locale, not the + "C" locale. Furthermore, when you attempt to set the "C" or "POSIX" + locale via setlocale(), what you get is a "C" locale with UTF-8 encoding, + that is, effectively the "C.UTF-8" locale. */ +#ifndef __ANDROID__ + if (MB_CUR_MAX != 1) return 1; +#endif + + if (test_sys_wait_macros ()) + return 2; return exitcode; } -- 2.34.1
>From 60745587a2e0073df7fe4cb1ec462f423c8f2bb2 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 30 Mar 2023 13:20:29 +0200 Subject: [PATCH 2/8] mbrtowc tests: Add comment. * tests/test-mbrtowc.c: Add comment. * tests/test-mbrtowc5.sh: Use symmetric coding style. * doc/posix-functions/mbrtowc.texi: Update. --- ChangeLog | 7 +++++++ doc/posix-functions/mbrtowc.texi | 2 +- tests/test-mbrtowc.c | 2 ++ tests/test-mbrtowc5.sh | 7 +++++-- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/ChangeLog b/ChangeLog index ba75036c75..a118bfc950 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2023-03-30 Bruno Haible <br...@clisp.org> + + mbrtowc tests: Add comment. + * tests/test-mbrtowc.c: Add comment. + * tests/test-mbrtowc5.sh: Use symmetric coding style. + * doc/posix-functions/mbrtowc.texi: Update. + 2023-03-30 Bruno Haible <br...@clisp.org> stdlib tests: Check behaviour of C locale. diff --git a/doc/posix-functions/mbrtowc.texi b/doc/posix-functions/mbrtowc.texi index d663d11bd8..bcba3f4c53 100644 --- a/doc/posix-functions/mbrtowc.texi +++ b/doc/posix-functions/mbrtowc.texi @@ -14,7 +14,7 @@ @item In the C or POSIX locales, this function can return @code{(size_t) -1} and set @code{errno} to @code{EILSEQ}: -glibc 2.23. +glibc 2.35. @item This function returns 0 instead of @code{(size_t) -2} when the input is empty: diff --git a/tests/test-mbrtowc.c b/tests/test-mbrtowc.c index 1fdf039c42..cf43011cad 100644 --- a/tests/test-mbrtowc.c +++ b/tests/test-mbrtowc.c @@ -367,6 +367,8 @@ main (int argc, char *argv[]) wc = (wchar_t) 0xBADFACE; ret = mbrtowc (&wc, buf, 1, &state); + /* POSIX:2018 says: "In the POSIX locale an [EILSEQ] error + cannot occur since all byte values are valid characters." */ ASSERT (ret == 1); if (c < 0x80) /* c is an ASCII character. */ diff --git a/tests/test-mbrtowc5.sh b/tests/test-mbrtowc5.sh index 490496de2b..4c6c6fe868 100755 --- a/tests/test-mbrtowc5.sh +++ b/tests/test-mbrtowc5.sh @@ -1,6 +1,9 @@ #!/bin/sh + # Test whether the POSIX locale has encoding errors. LC_ALL=C \ -${CHECKER} ./test-mbrtowc${EXEEXT} 5 || exit +${CHECKER} ./test-mbrtowc${EXEEXT} 5 || exit 1 LC_ALL=POSIX \ -${CHECKER} ./test-mbrtowc${EXEEXT} 5 +${CHECKER} ./test-mbrtowc${EXEEXT} 5 || exit 1 + +exit 0 -- 2.34.1
>From 11fe27f76c63c14b6f88c815b656f3f72d37ea41 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 30 Mar 2023 13:25:20 +0200 Subject: [PATCH 3/8] btowc: Fix behaviour in the C locale. * lib/btowc.c: Include <string.h> (btowc): Use mbrtowc instead of mbtowc when possible. * m4/btowc.m4 (gl_FUNC_BTOWC): Test for the mingw bug in the C locale. Invoke gl_MBRTOWC_C_LOCALE. If mbrtowc is buggy in the C locale, override also btowc. (gl_PREREQ_BTOWC): Test whether mbrtowc exists. * modules/btowc (Files): Add m4/mbrtowc.m4. (Depends-on): Add mbrtowc. * tests/test-btowc.c (main): Add a test of the C locale, based on tests/test-mbrtowc.c. * tests/test-btowc3.sh: New file, based on tests/test-mbrtowc5.sh. * modules/btowc-tests (Files): Add it. (Makefile.am): Test it. * doc/posix-functions/btowc.texi: Mention the two C locale behaviour bugs and that they are worked around. --- ChangeLog | 19 ++++++++++++ doc/posix-functions/btowc.texi | 12 ++++--- lib/btowc.c | 8 +++++ m4/btowc.m4 | 57 +++++++++++++++++++++++++++++++++- modules/btowc | 2 ++ modules/btowc-tests | 3 +- tests/test-btowc.c | 19 ++++++++++++ tests/test-btowc3.sh | 9 ++++++ 8 files changed, 123 insertions(+), 6 deletions(-) create mode 100755 tests/test-btowc3.sh diff --git a/ChangeLog b/ChangeLog index a118bfc950..a4af96ed5e 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,22 @@ +2023-03-30 Bruno Haible <br...@clisp.org> + + btowc: Fix behaviour in the C locale. + * lib/btowc.c: Include <string.h> + (btowc): Use mbrtowc instead of mbtowc when possible. + * m4/btowc.m4 (gl_FUNC_BTOWC): Test for the mingw bug in the C locale. + Invoke gl_MBRTOWC_C_LOCALE. If mbrtowc is buggy in the C locale, + override also btowc. + (gl_PREREQ_BTOWC): Test whether mbrtowc exists. + * modules/btowc (Files): Add m4/mbrtowc.m4. + (Depends-on): Add mbrtowc. + * tests/test-btowc.c (main): Add a test of the C locale, based on + tests/test-mbrtowc.c. + * tests/test-btowc3.sh: New file, based on tests/test-mbrtowc5.sh. + * modules/btowc-tests (Files): Add it. + (Makefile.am): Test it. + * doc/posix-functions/btowc.texi: Mention the two C locale behaviour + bugs and that they are worked around. + 2023-03-30 Bruno Haible <br...@clisp.org> mbrtowc tests: Add comment. diff --git a/doc/posix-functions/btowc.texi b/doc/posix-functions/btowc.texi index fa1ea9b503..f4ca5d450a 100644 --- a/doc/posix-functions/btowc.texi +++ b/doc/posix-functions/btowc.texi @@ -17,6 +17,14 @@ @item This function does not return WEOF for an EOF argument on some platforms: IRIX 6.5. +@item +In the C or POSIX locales, this function is not consistent with +Gnulib's @code{mbrtowc} and can return @code{WEOF}: +glibc 2.35, MirOS BSD #10. +@item +In the C or POSIX locales, this function is not consistent with @code{mbrtowc} +on some platforms: +mingw. @end itemize Portability problems not fixed by Gnulib: @@ -27,8 +35,4 @@ However, the Gnulib function @code{btoc32}, provided by Gnulib module @code{btoc32}, operates on 32-bit wide characters and therefore does not have this limitation. -@item -In the C or POSIX locales, this function is not consistent with -Gnulib's @code{mbrtowc} and can return @code{WEOF}: -glibc 2.23, MirOS BSD #10. @end itemize diff --git a/lib/btowc.c b/lib/btowc.c index caadbd7608..4defbdda72 100644 --- a/lib/btowc.c +++ b/lib/btowc.c @@ -22,6 +22,7 @@ #include <stdio.h> #include <stdlib.h> +#include <string.h> wint_t btowc (int c) @@ -32,7 +33,14 @@ btowc (int c) wchar_t wc; buf[0] = c; +#if HAVE_MBRTOWC + mbstate_t state; + memset (&state, 0, sizeof (mbstate_t)); + size_t ret = mbrtowc (&wc, buf, 1, &state); + if (!(ret == (size_t)(-1) || ret == (size_t)(-2))) +#else if (mbtowc (&wc, buf, 1) >= 0) +#endif return wc; } return WEOF; diff --git a/m4/btowc.m4 b/m4/btowc.m4 index 77218a7d1c..1cd100a2d7 100644 --- a/m4/btowc.m4 +++ b/m4/btowc.m4 @@ -1,4 +1,4 @@ -# btowc.m4 serial 12 +# btowc.m4 serial 13 dnl Copyright (C) 2008-2023 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, @@ -88,6 +88,49 @@ AC_DEFUN([gl_FUNC_BTOWC] fi ]) + dnl On mingw, in the C locale, btowc is inconsistent with mbrtowc: + dnl mbrtowc avoids calling MultiByteToWideChar when MB_CUR_MAX is 1 and + dnl ___lc_codepage_func() is 0, but btowc is lacking this special case. + AC_CHECK_FUNCS_ONCE([mbrtowc]) + AC_CACHE_CHECK([whether btowc is consistent with mbrtowc in the C locale], + [gl_cv_func_btowc_consistent], + [ + AC_RUN_IFELSE( + [AC_LANG_SOURCE([[ +#include <stdlib.h> +#include <string.h> +#include <wchar.h> +int main () +{ +#if HAVE_MBRTOWC + wint_t wc1 = btowc (0x80); + wchar_t wc2 = (wchar_t) 0xbadface; + char buf[1] = { 0x80 }; + mbstate_t state; + memset (&state, 0, sizeof (mbstate_t)); + if (mbrtowc (&wc2, buf, 1, &state) != 1 || wc1 != wc2) + return 1; +#endif + return 0; +}]])], + [gl_cv_func_btowc_consistent=yes], + [gl_cv_func_btowc_consistent=no], + [case "$host_os" in + # Guess no on mingw. + mingw*) AC_EGREP_CPP([Problem], [ +#ifdef __MINGW32__ + Problem +#endif + ], + [gl_cv_func_btowc_consistent="guessing no"], + [gl_cv_func_btowc_consistent="guessing yes"]) + ;; + # Guess yes otherwise. + *) gl_cv_func_btowc_consistent="guessing yes" ;; + esac + ]) + ]) + case "$gl_cv_func_btowc_nul" in *yes) ;; *) REPLACE_BTOWC=1 ;; @@ -96,10 +139,22 @@ AC_DEFUN([gl_FUNC_BTOWC] *yes) ;; *) REPLACE_BTOWC=1 ;; esac + case "$gl_cv_func_btowc_consistent" in + *yes) ;; + *) REPLACE_BTOWC=1 ;; + esac + if test $REPLACE_BTOWC = 0; then + gl_MBRTOWC_C_LOCALE + case "$gl_cv_func_mbrtowc_C_locale_sans_EILSEQ" in + *yes) ;; + *) REPLACE_BTOWC=1 ;; + esac + fi fi ]) # Prerequisites of lib/btowc.c. AC_DEFUN([gl_PREREQ_BTOWC], [ : + AC_CHECK_FUNCS_ONCE([mbrtowc]) ]) diff --git a/modules/btowc b/modules/btowc index 80d786cfa6..4788b3ec13 100644 --- a/modules/btowc +++ b/modules/btowc @@ -4,11 +4,13 @@ btowc() function: convert unibyte character to wide character. Files: lib/btowc.c m4/btowc.m4 +m4/mbrtowc.m4 m4/locale-fr.m4 Depends-on: wchar mbtowc [test $HAVE_BTOWC = 0 || test $REPLACE_BTOWC = 1] +mbrtowc [test $HAVE_BTOWC = 0 || test $REPLACE_BTOWC = 1] configure.ac: gl_FUNC_BTOWC diff --git a/modules/btowc-tests b/modules/btowc-tests index 6bd3258520..59d33eb00b 100644 --- a/modules/btowc-tests +++ b/modules/btowc-tests @@ -1,6 +1,7 @@ Files: tests/test-btowc1.sh tests/test-btowc2.sh +tests/test-btowc3.sh tests/test-btowc.c tests/signature.h tests/macros.h @@ -15,7 +16,7 @@ gt_LOCALE_FR gt_LOCALE_FR_UTF8 Makefile.am: -TESTS += test-btowc1.sh test-btowc2.sh +TESTS += test-btowc1.sh test-btowc2.sh test-btowc3.sh TESTS_ENVIRONMENT += LOCALE_FR='@LOCALE_FR@' LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' check_PROGRAMS += test-btowc test_btowc_LDADD = $(LDADD) $(SETLOCALE_LIB) diff --git a/tests/test-btowc.c b/tests/test-btowc.c index 849db470f4..e5918c36e9 100644 --- a/tests/test-btowc.c +++ b/tests/test-btowc.c @@ -57,6 +57,25 @@ main (int argc, char *argv[]) for (c = 0x80; c < 0x100; c++) ASSERT (btowc (c) == WEOF); return 0; + + case '3': + /* C or POSIX locale. */ + for (c = 0; c < 0x100; c++) + if (c != 0) + { + /* We are testing all nonnull bytes. */ + wint_t wc = btowc (c); + /* POSIX:2018 says: "In the POSIX locale, btowc() shall not return + WEOF if c has a value in the range 0 to 255 inclusive." */ + if (c < 0x80) + /* c is an ASCII character. */ + ASSERT (wc == c); + else + /* On most platforms, the bytes 0x80..0xFF map to U+0080..U+00FF. + But on musl libc, the bytes 0x80..0xFF map to U+DF80..U+DFFF. */ + ASSERT (wc == c || wc == 0xDF00 + c); + } + return 0; } return 1; diff --git a/tests/test-btowc3.sh b/tests/test-btowc3.sh new file mode 100755 index 0000000000..ee9e143c1c --- /dev/null +++ b/tests/test-btowc3.sh @@ -0,0 +1,9 @@ +#!/bin/sh + +# Test whether the POSIX locale has encoding errors. +LC_ALL=C \ +${CHECKER} ./test-btowc${EXEEXT} 3 || exit 1 +LC_ALL=POSIX \ +${CHECKER} ./test-btowc${EXEEXT} 3 || exit 1 + +exit 0 -- 2.34.1
From 1ab07af585358746e7fcc0176ab1716db31ca902 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 30 Mar 2023 17:55:31 +0200 Subject: [PATCH 4/8] mbrlen: Add tests. * tests/test-mbrlen1.sh: New file, based on tests/test-mbrtowc1.sh. * tests/test-mbrlen2.sh: New file, based on tests/test-mbrtowc2.sh. * tests/test-mbrlen3.sh: New file, based on tests/test-mbrtowc3.sh. * tests/test-mbrlen4.sh: New file, based on tests/test-mbrtowc4.sh. * tests/test-mbrlen5.sh: New file, based on tests/test-mbrtowc5.sh. * tests/test-mbrlen.c: New file, based on tests/test-mbrtowc.c. * tests/test-mbrlen-w32-1.sh: New file, based on tests/test-mbrtowc-w32-1.sh. * tests/test-mbrlen-w32-2.sh: New file, based on tests/test-mbrtowc-w32-2.sh. * tests/test-mbrlen-w32-3.sh: New file, based on tests/test-mbrtowc-w32-3.sh. * tests/test-mbrlen-w32-4.sh: New file, based on tests/test-mbrtowc-w32-4.sh. * tests/test-mbrlen-w32-5.sh: New file, based on tests/test-mbrtowc-w32-5.sh. * tests/test-mbrlen-w32-6.sh: New file, based on tests/test-mbrtowc-w32-6.sh. * tests/test-mbrlen-w32-7.sh: New file, based on tests/test-mbrtowc-w32-7.sh. * tests/test-mbrlen-w32.c: New file, based on tests/test-mbrtowc-w32.c. * modules/mbrlen-tests: New file, based on modules/mbrtowc-tests. * doc/posix-functions/mbrlen.texi: Update. --- ChangeLog | 27 ++ doc/posix-functions/mbrlen.texi | 2 +- modules/mbrlen-tests | 48 +++ tests/test-mbrlen-w32-1.sh | 4 + tests/test-mbrlen-w32-2.sh | 4 + tests/test-mbrlen-w32-3.sh | 4 + tests/test-mbrlen-w32-4.sh | 4 + tests/test-mbrlen-w32-5.sh | 4 + tests/test-mbrlen-w32-6.sh | 4 + tests/test-mbrlen-w32-7.sh | 4 + tests/test-mbrlen-w32.c | 565 ++++++++++++++++++++++++++++++++ tests/test-mbrlen.c | 297 +++++++++++++++++ tests/test-mbrlen1.sh | 15 + tests/test-mbrlen2.sh | 15 + tests/test-mbrlen3.sh | 15 + tests/test-mbrlen4.sh | 15 + tests/test-mbrlen5.sh | 9 + 17 files changed, 1035 insertions(+), 1 deletion(-) create mode 100644 modules/mbrlen-tests create mode 100755 tests/test-mbrlen-w32-1.sh create mode 100755 tests/test-mbrlen-w32-2.sh create mode 100755 tests/test-mbrlen-w32-3.sh create mode 100755 tests/test-mbrlen-w32-4.sh create mode 100755 tests/test-mbrlen-w32-5.sh create mode 100755 tests/test-mbrlen-w32-6.sh create mode 100755 tests/test-mbrlen-w32-7.sh create mode 100644 tests/test-mbrlen-w32.c create mode 100644 tests/test-mbrlen.c create mode 100755 tests/test-mbrlen1.sh create mode 100755 tests/test-mbrlen2.sh create mode 100755 tests/test-mbrlen3.sh create mode 100755 tests/test-mbrlen4.sh create mode 100755 tests/test-mbrlen5.sh diff --git a/ChangeLog b/ChangeLog index a4af96ed5e..fba5f444f6 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,30 @@ +2023-03-30 Bruno Haible <br...@clisp.org> + + mbrlen: Add tests. + * tests/test-mbrlen1.sh: New file, based on tests/test-mbrtowc1.sh. + * tests/test-mbrlen2.sh: New file, based on tests/test-mbrtowc2.sh. + * tests/test-mbrlen3.sh: New file, based on tests/test-mbrtowc3.sh. + * tests/test-mbrlen4.sh: New file, based on tests/test-mbrtowc4.sh. + * tests/test-mbrlen5.sh: New file, based on tests/test-mbrtowc5.sh. + * tests/test-mbrlen.c: New file, based on tests/test-mbrtowc.c. + * tests/test-mbrlen-w32-1.sh: New file, based on + tests/test-mbrtowc-w32-1.sh. + * tests/test-mbrlen-w32-2.sh: New file, based on + tests/test-mbrtowc-w32-2.sh. + * tests/test-mbrlen-w32-3.sh: New file, based on + tests/test-mbrtowc-w32-3.sh. + * tests/test-mbrlen-w32-4.sh: New file, based on + tests/test-mbrtowc-w32-4.sh. + * tests/test-mbrlen-w32-5.sh: New file, based on + tests/test-mbrtowc-w32-5.sh. + * tests/test-mbrlen-w32-6.sh: New file, based on + tests/test-mbrtowc-w32-6.sh. + * tests/test-mbrlen-w32-7.sh: New file, based on + tests/test-mbrtowc-w32-7.sh. + * tests/test-mbrlen-w32.c: New file, based on tests/test-mbrtowc-w32.c. + * modules/mbrlen-tests: New file, based on modules/mbrtowc-tests. + * doc/posix-functions/mbrlen.texi: Update. + 2023-03-30 Bruno Haible <br...@clisp.org> btowc: Fix behaviour in the C locale. diff --git a/doc/posix-functions/mbrlen.texi b/doc/posix-functions/mbrlen.texi index 7d0db72db5..0c55ddae21 100644 --- a/doc/posix-functions/mbrlen.texi +++ b/doc/posix-functions/mbrlen.texi @@ -14,7 +14,7 @@ @item In the C or POSIX locales, this function can return @code{(size_t) -1} and set @code{errno} to @code{EILSEQ}: -glibc 2.23. +glibc 2.35. @item This function returns 0 instead of @code{(size_t) -2} when the input is empty: diff --git a/modules/mbrlen-tests b/modules/mbrlen-tests new file mode 100644 index 0000000000..bd2091bc4d --- /dev/null +++ b/modules/mbrlen-tests @@ -0,0 +1,48 @@ +Files: +tests/test-mbrlen1.sh +tests/test-mbrlen2.sh +tests/test-mbrlen3.sh +tests/test-mbrlen4.sh +tests/test-mbrlen5.sh +tests/test-mbrlen.c +tests/test-mbrlen-w32-1.sh +tests/test-mbrlen-w32-2.sh +tests/test-mbrlen-w32-3.sh +tests/test-mbrlen-w32-4.sh +tests/test-mbrlen-w32-5.sh +tests/test-mbrlen-w32-6.sh +tests/test-mbrlen-w32-7.sh +tests/test-mbrlen-w32.c +tests/signature.h +tests/macros.h +m4/locale-fr.m4 +m4/locale-ja.m4 +m4/locale-zh.m4 +m4/codeset.m4 + +Depends-on: +mbsinit +wctob +setlocale +localcharset + +configure.ac: +gt_LOCALE_FR +gt_LOCALE_FR_UTF8 +gt_LOCALE_JA +gt_LOCALE_ZH_CN + +Makefile.am: +TESTS += \ + test-mbrlen1.sh test-mbrlen2.sh test-mbrlen3.sh test-mbrlen4.sh \ + test-mbrlen5.sh \ + test-mbrlen-w32-1.sh test-mbrlen-w32-2.sh test-mbrlen-w32-3.sh \ + test-mbrlen-w32-4.sh test-mbrlen-w32-5.sh test-mbrlen-w32-6.sh \ + test-mbrlen-w32-7.sh +TESTS_ENVIRONMENT += \ + LOCALE_FR='@LOCALE_FR@' \ + LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \ + LOCALE_JA='@LOCALE_JA@' \ + LOCALE_ZH_CN='@LOCALE_ZH_CN@' +check_PROGRAMS += test-mbrlen test-mbrlen-w32 +test_mbrlen_LDADD = $(LDADD) $(SETLOCALE_LIB) $(MBRTOWC_LIB) diff --git a/tests/test-mbrlen-w32-1.sh b/tests/test-mbrlen-w32-1.sh new file mode 100755 index 0000000000..ccf4c538cb --- /dev/null +++ b/tests/test-mbrlen-w32-1.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP1252 locale. +${CHECKER} ./test-mbrlen-w32${EXEEXT} French_France 1252 diff --git a/tests/test-mbrlen-w32-2.sh b/tests/test-mbrlen-w32-2.sh new file mode 100755 index 0000000000..22c5388d2a --- /dev/null +++ b/tests/test-mbrlen-w32-2.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP1256 locale. +${CHECKER} ./test-mbrlen-w32${EXEEXT} "Arabic_Saudi Arabia" 1256 diff --git a/tests/test-mbrlen-w32-3.sh b/tests/test-mbrlen-w32-3.sh new file mode 100755 index 0000000000..12c2857b9e --- /dev/null +++ b/tests/test-mbrlen-w32-3.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP932 locale. +${CHECKER} ./test-mbrlen-w32${EXEEXT} Japanese_Japan 932 diff --git a/tests/test-mbrlen-w32-4.sh b/tests/test-mbrlen-w32-4.sh new file mode 100755 index 0000000000..0746b240e7 --- /dev/null +++ b/tests/test-mbrlen-w32-4.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP950 locale. +${CHECKER} ./test-mbrlen-w32${EXEEXT} Chinese_Taiwan 950 diff --git a/tests/test-mbrlen-w32-5.sh b/tests/test-mbrlen-w32-5.sh new file mode 100755 index 0000000000..eb2cec24df --- /dev/null +++ b/tests/test-mbrlen-w32-5.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP936 locale. +${CHECKER} ./test-mbrlen-w32${EXEEXT} Chinese_China 936 diff --git a/tests/test-mbrlen-w32-6.sh b/tests/test-mbrlen-w32-6.sh new file mode 100755 index 0000000000..d586b7dd2a --- /dev/null +++ b/tests/test-mbrlen-w32-6.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a GB18030 locale. +${CHECKER} ./test-mbrlen-w32${EXEEXT} Chinese_China 54936 diff --git a/tests/test-mbrlen-w32-7.sh b/tests/test-mbrlen-w32-7.sh new file mode 100755 index 0000000000..dbf82d4abd --- /dev/null +++ b/tests/test-mbrlen-w32-7.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test some UTF-8 locales. +${CHECKER} ./test-mbrlen-w32${EXEEXT} French_France Japanese_Japan Chinese_Taiwan Chinese_China 65001 diff --git a/tests/test-mbrlen-w32.c b/tests/test-mbrlen-w32.c new file mode 100644 index 0000000000..2916cf1095 --- /dev/null +++ b/tests/test-mbrlen-w32.c @@ -0,0 +1,565 @@ +/* Test of conversion of multibyte character to wide character. + Copyright (C) 2008-2023 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +#include <config.h> + +#include <wchar.h> + +#include <errno.h> +#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "localcharset.h" +#include "macros.h" + +#if defined _WIN32 && !defined __CYGWIN__ + +static int +test_one_locale (const char *name, int codepage) +{ + mbstate_t state; + size_t ret; + +# if 1 + /* Portable code to set the locale. */ + { + char name_with_codepage[1024]; + + sprintf (name_with_codepage, "%s.%d", name, codepage); + + /* Set the locale. */ + if (setlocale (LC_ALL, name_with_codepage) == NULL) + return 77; + } +# else + /* Hacky way to set a locale.codepage combination that setlocale() refuses + to set. */ + { + /* Codepage of the current locale, set with setlocale(). + Not necessarily the same as GetACP(). */ + extern __declspec(dllimport) unsigned int __lc_codepage; + + /* Set the locale. */ + if (setlocale (LC_ALL, name) == NULL) + return 77; + + /* Clobber the codepage and MB_CUR_MAX, both set by setlocale(). */ + __lc_codepage = codepage; + switch (codepage) + { + case 1252: + case 1256: + MB_CUR_MAX = 1; + break; + case 932: + case 950: + case 936: + MB_CUR_MAX = 2; + break; + case 54936: + case 65001: + MB_CUR_MAX = 4; + break; + } + + /* Test whether the codepage is really available. */ + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrlen (" ", 1, &state) == (size_t)(-1)) + return 77; + } +# endif + + /* Test zero-length input. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("x", 0, &state); + /* gnulib's implementation returns (size_t)(-2). + The AIX 5.1 implementation returns (size_t)(-1). + glibc's implementation returns 0. */ + ASSERT (ret == (size_t)(-2) || ret == (size_t)(-1) || ret == 0); + ASSERT (mbsinit (&state)); + } + + /* Test NUL byte input. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("", 1, &state); + ASSERT (ret == 0); + ASSERT (mbsinit (&state)); + } + + /* Test single-byte input. */ + { + int c; + char buf[1]; + + memset (&state, '\0', sizeof (mbstate_t)); + for (c = 0; c < 0x100; c++) + switch (c) + { + case '\t': case '\v': case '\f': + case ' ': case '!': case '"': case '#': case '%': + case '&': case '\'': case '(': case ')': case '*': + case '+': case ',': case '-': case '.': case '/': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + case ':': case ';': case '<': case '=': case '>': + case '?': + case 'A': case 'B': case 'C': case 'D': case 'E': + case 'F': case 'G': case 'H': case 'I': case 'J': + case 'K': case 'L': case 'M': case 'N': case 'O': + case 'P': case 'Q': case 'R': case 'S': case 'T': + case 'U': case 'V': case 'W': case 'X': case 'Y': + case 'Z': + case '[': case '\\': case ']': case '^': case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': + case 'f': case 'g': case 'h': case 'i': case 'j': + case 'k': case 'l': case 'm': case 'n': case 'o': + case 'p': case 'q': case 'r': case 's': case 't': + case 'u': case 'v': case 'w': case 'x': case 'y': + case 'z': case '{': case '|': case '}': case '~': + /* c is in the ISO C "basic character set". */ + buf[0] = c; + ret = mbrlen (buf, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + break; + } + } + + /* Test special calling convention, passing a NULL pointer. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen (NULL, 5, &state); + ASSERT (ret == 0); + ASSERT (mbsinit (&state)); + } + + switch (codepage) + { + case 1252: + /* Locale encoding is CP1252, an extension of ISO-8859-1. */ + { + char input[] = "B\374\337er"; /* "Büßer" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + + ret = mbrlen (input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + ret = mbrlen (input + 3, 2, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + + ret = mbrlen (input + 4, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + } + return 0; + + case 1256: + /* Locale encoding is CP1256, not the same as ISO-8859-6. */ + { + char input[] = "x\302\341\346y"; /* "xآلوy" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + + ret = mbrlen (input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + ret = mbrlen (input + 3, 2, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + + ret = mbrlen (input + 4, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + } + return 0; + + case 932: + /* Locale encoding is CP932, similar to Shift_JIS. */ + { + char input[] = "<\223\372\226\173\214\352>"; /* "<日本語>" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 2, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + input[2] = '\0'; + + ret = mbrlen (input + 3, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (!mbsinit (&state)); + input[3] = '\0'; + + ret = mbrlen (input + 4, 4, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[4] = '\0'; + + ret = mbrlen (input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + input[6] = '\0'; + + ret = mbrlen (input + 7, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\377", 1, &state); /* 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\225\377", 2, &state); /* 0x95 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == 2); + } + return 0; + + case 950: + /* Locale encoding is CP950, similar to Big5. */ + { + char input[] = "<\244\351\245\273\273\171>"; /* "<日本語>" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 2, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + input[2] = '\0'; + + ret = mbrlen (input + 3, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (!mbsinit (&state)); + input[3] = '\0'; + + ret = mbrlen (input + 4, 4, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[4] = '\0'; + + ret = mbrlen (input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + input[6] = '\0'; + + ret = mbrlen (input + 7, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\377", 1, &state); /* 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\225\377", 2, &state); /* 0x95 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == 2); + } + return 0; + + case 936: + /* Locale encoding is CP936 = GBK, an extension of GB2312. */ + { + char input[] = "<\310\325\261\276\325\132>"; /* "<日本語>" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 2, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + input[2] = '\0'; + + ret = mbrlen (input + 3, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (!mbsinit (&state)); + input[3] = '\0'; + + ret = mbrlen (input + 4, 4, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[4] = '\0'; + + ret = mbrlen (input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + input[6] = '\0'; + + ret = mbrlen (input + 7, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\377", 1, &state); /* 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\225\377", 2, &state); /* 0x95 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == 2); + } + return 0; + + case 54936: + /* Locale encoding is CP54936 = GB18030. */ + if (strcmp (locale_charset (), "GB18030") != 0) + return 77; + { + char input[] = "B\250\271\201\060\211\070er"; /* "Büßer" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (!mbsinit (&state)); + input[1] = '\0'; + + ret = mbrlen (input + 2, 7, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + ret = mbrlen (input + 3, 6, &state); + ASSERT (ret == 4); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + input[5] = '\0'; + input[6] = '\0'; + + ret = mbrlen (input + 7, 2, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[7] = '\0'; + + ret = mbrlen (input + 8, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\377", 1, &state); /* 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\225\377", 2, &state); /* 0x95 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\201\045", 2, &state); /* 0x81 0x25 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\201\060\377", 3, &state); /* 0x81 0x30 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\201\060\377\064", 4, &state); /* 0x81 0x30 0xFF 0x34 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\201\060\211\072", 4, &state); /* 0x81 0x30 0x89 0x3A */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + } + return 0; + + case 65001: + /* Locale encoding is CP65001 = UTF-8. */ + if (strcmp (locale_charset (), "UTF-8") != 0) + return 77; + { + char input[] = "B\303\274\303\237er"; /* "Büßer" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (!mbsinit (&state)); + input[1] = '\0'; + + ret = mbrlen (input + 2, 5, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + ret = mbrlen (input + 3, 4, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + + ret = mbrlen (input + 5, 2, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + + ret = mbrlen (input + 6, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\377", 1, &state); /* 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\303\300", 2, &state); /* 0xC3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\343\300", 2, &state); /* 0xE3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\343\300\200", 3, &state); /* 0xE3 0xC0 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\343\200\300", 3, &state); /* 0xE3 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\363\300", 2, &state); /* 0xF3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\363\300\200\200", 4, &state); /* 0xF3 0xC0 0x80 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\363\200\300", 3, &state); /* 0xF3 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\363\200\300\200", 4, &state); /* 0xF3 0x80 0xC0 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("\363\200\200\300", 4, &state); /* 0xF3 0x80 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + } + return 0; + + default: + return 1; + } +} + +int +main (int argc, char *argv[]) +{ + int codepage = atoi (argv[argc - 1]); + int result; + int i; + + result = 77; + for (i = 1; i < argc - 1; i++) + { + int ret = test_one_locale (argv[i], codepage); + + if (ret != 77) + result = ret; + } + + if (result == 77) + { + fprintf (stderr, "Skipping test: found no locale with codepage %d\n", + codepage); + } + return result; +} + +#else + +int +main (int argc, char *argv[]) +{ + fputs ("Skipping test: not a native Windows system\n", stderr); + return 77; +} + +#endif diff --git a/tests/test-mbrlen.c b/tests/test-mbrlen.c new file mode 100644 index 0000000000..005d0c801d --- /dev/null +++ b/tests/test-mbrlen.c @@ -0,0 +1,297 @@ +/* Test of conversion of multibyte character to wide character. + Copyright (C) 2008-2023 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2023. */ + +#include <config.h> + +#include <wchar.h> + +#include "signature.h" +SIGNATURE_CHECK (mbrlen, size_t, (char const *, size_t, mbstate_t *)); + +#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "macros.h" + +int +main (int argc, char *argv[]) +{ + mbstate_t state; + size_t ret; + + /* configure should already have checked that the locale is supported. */ + if (setlocale (LC_ALL, "") == NULL) + return 1; + + /* Test zero-length input. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("x", 0, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (mbsinit (&state)); + } + + /* Test NUL byte input. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen ("", 1, &state); + ASSERT (ret == 0); + ASSERT (mbsinit (&state)); + } + + /* Test single-byte input. */ + { + int c; + char buf[1]; + + memset (&state, '\0', sizeof (mbstate_t)); + for (c = 0; c < 0x100; c++) + switch (c) + { + case '\t': case '\v': case '\f': + case ' ': case '!': case '"': case '#': case '%': + case '&': case '\'': case '(': case ')': case '*': + case '+': case ',': case '-': case '.': case '/': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + case ':': case ';': case '<': case '=': case '>': + case '?': + case 'A': case 'B': case 'C': case 'D': case 'E': + case 'F': case 'G': case 'H': case 'I': case 'J': + case 'K': case 'L': case 'M': case 'N': case 'O': + case 'P': case 'Q': case 'R': case 'S': case 'T': + case 'U': case 'V': case 'W': case 'X': case 'Y': + case 'Z': + case '[': case '\\': case ']': case '^': case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': + case 'f': case 'g': case 'h': case 'i': case 'j': + case 'k': case 'l': case 'm': case 'n': case 'o': + case 'p': case 'q': case 'r': case 's': case 't': + case 'u': case 'v': case 'w': case 'x': case 'y': + case 'z': case '{': case '|': case '}': case '~': + /* c is in the ISO C "basic character set". */ + ASSERT (c < 0x80); + /* c is an ASCII character. */ + buf[0] = c; + + ret = mbrlen (buf, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + break; + default: + break; + } + } + + /* Test special calling convention, passing a NULL pointer. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrlen (NULL, 5, &state); + ASSERT (ret == 0); + ASSERT (mbsinit (&state)); + } + +#ifdef __ANDROID__ + /* On Android ≥ 5.0, the default locale is the "C.UTF-8" locale, not the + "C" locale. Furthermore, when you attempt to set the "C" or "POSIX" + locale via setlocale(), what you get is a "C" locale with UTF-8 encoding, + that is, effectively the "C.UTF-8" locale. */ + if (argc > 1 && strcmp (argv[1], "5") == 0 && MB_CUR_MAX > 1) + argv[1] = "2"; +#endif + + if (argc > 1) + switch (argv[1][0]) + { + case '1': + /* Locale encoding is ISO-8859-1 or ISO-8859-15. */ + { + char input[] = "B\374\337er"; /* "Büßer" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + + ret = mbrlen (input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + ret = mbrlen (input + 3, 2, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + + ret = mbrlen (input + 4, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + } + return 0; + + case '2': + /* Locale encoding is UTF-8. */ + { + char input[] = "B\303\274\303\237er"; /* "Büßer" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (!mbsinit (&state)); + input[1] = '\0'; + + ret = mbrlen (input + 2, 5, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + ret = mbrlen (input + 3, 4, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + + ret = mbrlen (input + 5, 2, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + + ret = mbrlen (input + 6, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + } + return 0; + + case '3': + /* Locale encoding is EUC-JP. */ + { + char input[] = "<\306\374\313\334\270\354>"; /* "<日本語>" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 2, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + input[2] = '\0'; + + ret = mbrlen (input + 3, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (!mbsinit (&state)); + input[3] = '\0'; + + ret = mbrlen (input + 4, 4, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[4] = '\0'; + + ret = mbrlen (input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + input[6] = '\0'; + + ret = mbrlen (input + 7, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + } + return 0; + + case '4': + /* Locale encoding is GB18030. */ + { + char input[] = "B\250\271\201\060\211\070er"; /* "Büßer" */ + memset (&state, '\0', sizeof (mbstate_t)); + + ret = mbrlen (input, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + ret = mbrlen (input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (!mbsinit (&state)); + input[1] = '\0'; + + ret = mbrlen (input + 2, 7, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + ret = mbrlen (input + 3, 6, &state); + ASSERT (ret == 4); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + input[5] = '\0'; + input[6] = '\0'; + + ret = mbrlen (input + 7, 2, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + input[7] = '\0'; + + ret = mbrlen (input + 8, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + } + return 0; + + case '5': + /* C or POSIX locale. */ + { + int c; + char buf[1]; + + memset (&state, '\0', sizeof (mbstate_t)); + for (c = 0; c < 0x100; c++) + if (c != 0) + { + /* We are testing all nonnull bytes. */ + buf[0] = c; + + ret = mbrlen (buf, 1, &state); + /* POSIX:2018 says: "In the POSIX locale an [EILSEQ] error + cannot occur since all byte values are valid characters." */ + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + } + } + return 0; + } + + return 1; +} diff --git a/tests/test-mbrlen1.sh b/tests/test-mbrlen1.sh new file mode 100755 index 0000000000..f07bc37e96 --- /dev/null +++ b/tests/test-mbrlen1.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test in an ISO-8859-1 or ISO-8859-15 locale. +: "${LOCALE_FR=fr_FR}" +if test $LOCALE_FR = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no traditional french locale is installed" + else + echo "Skipping test: no traditional french locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_FR \ +${CHECKER} ./test-mbrlen${EXEEXT} 1 diff --git a/tests/test-mbrlen2.sh b/tests/test-mbrlen2.sh new file mode 100755 index 0000000000..108394a60e --- /dev/null +++ b/tests/test-mbrlen2.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific UTF-8 locale is installed. +: "${LOCALE_FR_UTF8=fr_FR.UTF-8}" +if test $LOCALE_FR_UTF8 = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no french Unicode locale is installed" + else + echo "Skipping test: no french Unicode locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_FR_UTF8 \ +${CHECKER} ./test-mbrlen${EXEEXT} 2 diff --git a/tests/test-mbrlen3.sh b/tests/test-mbrlen3.sh new file mode 100755 index 0000000000..57199a0ccc --- /dev/null +++ b/tests/test-mbrlen3.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific EUC-JP locale is installed. +: "${LOCALE_JA=ja_JP}" +if test $LOCALE_JA = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no traditional japanese locale is installed" + else + echo "Skipping test: no traditional japanese locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_JA \ +${CHECKER} ./test-mbrlen${EXEEXT} 3 diff --git a/tests/test-mbrlen4.sh b/tests/test-mbrlen4.sh new file mode 100755 index 0000000000..8f83d4d025 --- /dev/null +++ b/tests/test-mbrlen4.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific GB18030 locale is installed. +: "${LOCALE_ZH_CN=zh_CN.GB18030}" +if test $LOCALE_ZH_CN = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no transitional chinese locale is installed" + else + echo "Skipping test: no transitional chinese locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_ZH_CN \ +${CHECKER} ./test-mbrlen${EXEEXT} 4 diff --git a/tests/test-mbrlen5.sh b/tests/test-mbrlen5.sh new file mode 100755 index 0000000000..9b14cb1e1c --- /dev/null +++ b/tests/test-mbrlen5.sh @@ -0,0 +1,9 @@ +#!/bin/sh + +# Test whether the POSIX locale has encoding errors. +LC_ALL=C \ +${CHECKER} ./test-mbrlen${EXEEXT} 5 || exit 1 +LC_ALL=POSIX \ +${CHECKER} ./test-mbrlen${EXEEXT} 5 || exit 1 + +exit 0 -- 2.34.1
>From 63861afac8004becc3907993cd16b59a5bca5195 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 30 Mar 2023 23:12:50 +0200 Subject: [PATCH 5/8] mbsrtowcs: Fix behaviour in the C locale. * m4/mbsrtowcs.m4 (gl_FUNC_MBSRTOWCS): Invoke gl_MBRTOWC_C_LOCALE. If mbrtowc is buggy in the C locale, override also mbsrtowcs. * modules/mbsrtowcs (Files): Add m4/mbrtowc.m4. * tests/test-mbsrtowcs.c (main): Add a test of the C locale, based on tests/test-mbrtowc.c. * tests/test-mbsrtowcs5.sh: New file, based on tests/test-mbrtowc5.sh. * modules/mbsrtowcs-tests (Files): Add it. (Makefile.am): Test it. * doc/posix-functions/mbsrtowcs.texi: Mention the C locale behaviour bug. --- ChangeLog | 14 +++++++ doc/posix-functions/mbsrtowcs.texi | 4 ++ m4/mbsrtowcs.m4 | 9 +++- modules/mbsrtowcs | 1 + modules/mbsrtowcs-tests | 5 ++- tests/test-mbsrtowcs.c | 66 ++++++++++++++++++++++++++++++ tests/test-mbsrtowcs5.sh | 9 ++++ 7 files changed, 106 insertions(+), 2 deletions(-) create mode 100755 tests/test-mbsrtowcs5.sh diff --git a/ChangeLog b/ChangeLog index fba5f444f6..2fe1679061 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,17 @@ +2023-03-30 Bruno Haible <br...@clisp.org> + + mbsrtowcs: Fix behaviour in the C locale. + * m4/mbsrtowcs.m4 (gl_FUNC_MBSRTOWCS): Invoke gl_MBRTOWC_C_LOCALE. If + mbrtowc is buggy in the C locale, override also mbsrtowcs. + * modules/mbsrtowcs (Files): Add m4/mbrtowc.m4. + * tests/test-mbsrtowcs.c (main): Add a test of the C locale, based on + tests/test-mbrtowc.c. + * tests/test-mbsrtowcs5.sh: New file, based on tests/test-mbrtowc5.sh. + * modules/mbsrtowcs-tests (Files): Add it. + (Makefile.am): Test it. + * doc/posix-functions/mbsrtowcs.texi: Mention the C locale behaviour + bug. + 2023-03-30 Bruno Haible <br...@clisp.org> mbrlen: Add tests. diff --git a/doc/posix-functions/mbsrtowcs.texi b/doc/posix-functions/mbsrtowcs.texi index f27ec0f616..28c450ddfb 100644 --- a/doc/posix-functions/mbsrtowcs.texi +++ b/doc/posix-functions/mbsrtowcs.texi @@ -15,6 +15,10 @@ This function does not work on some platforms: HP-UX 11, Solaris 11 2010-11. @item +In the C or POSIX locales, this function can return @code{(size_t) -1} +and set @code{errno} to @code{EILSEQ}: +glibc 2.35. +@item This function does not work when the first argument is NULL on some platforms: mingw. @end itemize diff --git a/m4/mbsrtowcs.m4 b/m4/mbsrtowcs.m4 index f95af621dd..4f2e88c5a0 100644 --- a/m4/mbsrtowcs.m4 +++ b/m4/mbsrtowcs.m4 @@ -1,4 +1,4 @@ -# mbsrtowcs.m4 serial 14 +# mbsrtowcs.m4 serial 15 dnl Copyright (C) 2008-2023 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, @@ -32,6 +32,13 @@ AC_DEFUN([gl_FUNC_MBSRTOWCS] *yes) ;; *) REPLACE_MBSRTOWCS=1 ;; esac + if test $REPLACE_MBSRTOWCS = 0; then + gl_MBRTOWC_C_LOCALE + case "$gl_cv_func_mbrtowc_C_locale_sans_EILSEQ" in + *yes) ;; + *) REPLACE_MBSRTOWCS=1 ;; + esac + fi fi fi ]) diff --git a/modules/mbsrtowcs b/modules/mbsrtowcs index 4b5fa61cf4..70e7d0c36f 100644 --- a/modules/mbsrtowcs +++ b/modules/mbsrtowcs @@ -7,6 +7,7 @@ lib/mbsrtowcs-impl.h lib/mbsrtowcs-state.c m4/mbsrtowcs.m4 m4/mbstate_t.m4 +m4/mbrtowc.m4 m4/locale-fr.m4 m4/locale-ja.m4 m4/locale-zh.m4 diff --git a/modules/mbsrtowcs-tests b/modules/mbsrtowcs-tests index d7301057d9..36050c670d 100644 --- a/modules/mbsrtowcs-tests +++ b/modules/mbsrtowcs-tests @@ -3,6 +3,7 @@ tests/test-mbsrtowcs1.sh tests/test-mbsrtowcs2.sh tests/test-mbsrtowcs3.sh tests/test-mbsrtowcs4.sh +tests/test-mbsrtowcs5.sh tests/test-mbsrtowcs.c tests/signature.h tests/macros.h @@ -24,7 +25,9 @@ gt_LOCALE_JA gt_LOCALE_ZH_CN Makefile.am: -TESTS += test-mbsrtowcs1.sh test-mbsrtowcs2.sh test-mbsrtowcs3.sh test-mbsrtowcs4.sh +TESTS += \ + test-mbsrtowcs1.sh test-mbsrtowcs2.sh test-mbsrtowcs3.sh test-mbsrtowcs4.sh \ + test-mbsrtowcs5.sh TESTS_ENVIRONMENT += \ LOCALE_FR='@LOCALE_FR@' \ LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \ diff --git a/tests/test-mbsrtowcs.c b/tests/test-mbsrtowcs.c index d511690db9..7e8cc4a1ea 100644 --- a/tests/test-mbsrtowcs.c +++ b/tests/test-mbsrtowcs.c @@ -281,6 +281,72 @@ main (int argc, char *argv[]) } break; + case '5': + /* C or POSIX locale. */ + { + char input[] = "n/a"; + memset (&state, '\0', sizeof (mbstate_t)); + + src = input; + temp_state = state; + ret = mbsrtowcs (NULL, &src, unlimited ? BUFSIZE : 1, &temp_state); + ASSERT (ret == 3); + ASSERT (src == input); + ASSERT (mbsinit (&state)); + + src = input; + ret = mbsrtowcs (buf, &src, unlimited ? BUFSIZE : 1, &state); + ASSERT (ret == (unlimited ? 3 : 1)); + ASSERT (src == (unlimited ? NULL : input + 1)); + ASSERT (buf[0] == 'n'); + if (unlimited) + { + ASSERT (buf[1] == '/'); + ASSERT (buf[2] == 'a'); + ASSERT (buf[3] == 0); + ASSERT (buf[4] == (wchar_t) 0xBADFACE); + } + else + ASSERT (buf[1] == (wchar_t) 0xBADFACE); + ASSERT (mbsinit (&state)); + } + { + int c; + char input[2]; + + memset (&state, '\0', sizeof (mbstate_t)); + for (c = 0; c < 0x100; c++) + if (c != 0) + { + /* We are testing all nonnull bytes. */ + input[0] = c; + input[1] = '\0'; + + src = input; + ret = mbsrtowcs (NULL, &src, unlimited ? BUFSIZE : 1, &state); + ASSERT (ret == 1); + ASSERT (src == input); + ASSERT (mbsinit (&state)); + + buf[0] = buf[1] = (wchar_t) 0xBADFACE; + src = input; + ret = mbsrtowcs (buf, &src, unlimited ? BUFSIZE : 1, &state); + /* POSIX:2018 says: "In the POSIX locale an [EILSEQ] error + cannot occur since all byte values are valid characters." */ + ASSERT (ret == 1); + ASSERT (src == (unlimited ? NULL : input + 1)); + if (c < 0x80) + /* c is an ASCII character. */ + ASSERT (buf[0] == c); + else + /* On most platforms, the bytes 0x80..0xFF map to U+0080..U+00FF. + But on musl libc, the bytes 0x80..0xFF map to U+DF80..U+DFFF. */ + ASSERT (buf[0] == (btowc (c) == 0xDF00 + c ? btowc (c) : c)); + ASSERT (mbsinit (&state)); + } + } + break; + default: return 1; } diff --git a/tests/test-mbsrtowcs5.sh b/tests/test-mbsrtowcs5.sh new file mode 100755 index 0000000000..96734a73bd --- /dev/null +++ b/tests/test-mbsrtowcs5.sh @@ -0,0 +1,9 @@ +#!/bin/sh + +# Test whether the POSIX locale has encoding errors. +LC_ALL=C \ +${CHECKER} ./test-mbsrtowcs${EXEEXT} 5 || exit 1 +LC_ALL=POSIX \ +${CHECKER} ./test-mbsrtowcs${EXEEXT} 5 || exit 1 + +exit 0 -- 2.34.1
>From 8906a5101bf62b719ddafa376fd4e0805e35617b Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Thu, 30 Mar 2023 23:15:52 +0200 Subject: [PATCH 6/8] mbsnrtowcs: Fix behaviour in the C locale. * m4/mbsnrtowcs.m4 (gl_FUNC_MBSNRTOWCS): Invoke gl_MBRTOWC_C_LOCALE. If mbrtowc is buggy in the C locale, override also mbsnrtowcs. * modules/mbsnrtowcs (Files): Add m4/mbrtowc.m4. * tests/test-mbsnrtowcs.c (main): Add a test of the C locale, based on tests/test-mbsrtowcs.c. * tests/test-mbsnrtowcs5.sh: New file, based on tests/test-mbrtowc5.sh. * modules/mbsnrtowcs-tests (Files): Add it. (Makefile.am): Test it. * doc/posix-functions/mbsnrtowcs.texi: Mention the C locale behaviour bug. --- ChangeLog | 14 ++++++ doc/posix-functions/mbsnrtowcs.texi | 4 ++ m4/mbsnrtowcs.m4 | 9 +++- modules/mbsnrtowcs | 1 + modules/mbsnrtowcs-tests | 5 ++- tests/test-mbsnrtowcs.c | 66 +++++++++++++++++++++++++++++ tests/test-mbsnrtowcs5.sh | 9 ++++ 7 files changed, 106 insertions(+), 2 deletions(-) create mode 100755 tests/test-mbsnrtowcs5.sh diff --git a/ChangeLog b/ChangeLog index 2fe1679061..4afba58a3c 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,17 @@ +2023-03-30 Bruno Haible <br...@clisp.org> + + mbsnrtowcs: Fix behaviour in the C locale. + * m4/mbsnrtowcs.m4 (gl_FUNC_MBSNRTOWCS): Invoke gl_MBRTOWC_C_LOCALE. If + mbrtowc is buggy in the C locale, override also mbsnrtowcs. + * modules/mbsnrtowcs (Files): Add m4/mbrtowc.m4. + * tests/test-mbsnrtowcs.c (main): Add a test of the C locale, based on + tests/test-mbsrtowcs.c. + * tests/test-mbsnrtowcs5.sh: New file, based on tests/test-mbrtowc5.sh. + * modules/mbsnrtowcs-tests (Files): Add it. + (Makefile.am): Test it. + * doc/posix-functions/mbsnrtowcs.texi: Mention the C locale behaviour + bug. + 2023-03-30 Bruno Haible <br...@clisp.org> mbsrtowcs: Fix behaviour in the C locale. diff --git a/doc/posix-functions/mbsnrtowcs.texi b/doc/posix-functions/mbsnrtowcs.texi index c6defd2c20..3a420c673a 100644 --- a/doc/posix-functions/mbsnrtowcs.texi +++ b/doc/posix-functions/mbsnrtowcs.texi @@ -14,6 +14,10 @@ @item This function produces invalid wide characters on some platforms: Solaris 11.4. +@item +In the C or POSIX locales, this function can return @code{(size_t) -1} +and set @code{errno} to @code{EILSEQ}: +glibc 2.35. @end itemize Portability problems not fixed by Gnulib: diff --git a/m4/mbsnrtowcs.m4 b/m4/mbsnrtowcs.m4 index 34dcf30e63..7cab5f79ed 100644 --- a/m4/mbsnrtowcs.m4 +++ b/m4/mbsnrtowcs.m4 @@ -1,4 +1,4 @@ -# mbsnrtowcs.m4 serial 8 +# mbsnrtowcs.m4 serial 9 dnl Copyright (C) 2008, 2010-2023 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, @@ -29,6 +29,13 @@ AC_DEFUN([gl_FUNC_MBSNRTOWCS] *yes) ;; *) REPLACE_MBSNRTOWCS=1 ;; esac + if test $REPLACE_MBSNRTOWCS = 0; then + gl_MBRTOWC_C_LOCALE + case "$gl_cv_func_mbrtowc_C_locale_sans_EILSEQ" in + *yes) ;; + *) REPLACE_MBSNRTOWCS=1 ;; + esac + fi fi fi ]) diff --git a/modules/mbsnrtowcs b/modules/mbsnrtowcs index 5b66cc2b8f..10260246ea 100644 --- a/modules/mbsnrtowcs +++ b/modules/mbsnrtowcs @@ -7,6 +7,7 @@ lib/mbsnrtowcs-impl.h lib/mbsrtowcs-state.c m4/mbsnrtowcs.m4 m4/mbstate_t.m4 +m4/mbrtowc.m4 Depends-on: wchar diff --git a/modules/mbsnrtowcs-tests b/modules/mbsnrtowcs-tests index c7c528448f..7c5de260ba 100644 --- a/modules/mbsnrtowcs-tests +++ b/modules/mbsnrtowcs-tests @@ -3,6 +3,7 @@ tests/test-mbsnrtowcs1.sh tests/test-mbsnrtowcs2.sh tests/test-mbsnrtowcs3.sh tests/test-mbsnrtowcs4.sh +tests/test-mbsnrtowcs5.sh tests/test-mbsnrtowcs.c tests/signature.h tests/macros.h @@ -24,7 +25,9 @@ gt_LOCALE_JA gt_LOCALE_ZH_CN Makefile.am: -TESTS += test-mbsnrtowcs1.sh test-mbsnrtowcs2.sh test-mbsnrtowcs3.sh test-mbsnrtowcs4.sh +TESTS += \ + test-mbsnrtowcs1.sh test-mbsnrtowcs2.sh test-mbsnrtowcs3.sh \ + test-mbsnrtowcs4.sh test-mbsnrtowcs5.sh TESTS_ENVIRONMENT += \ LOCALE_FR='@LOCALE_FR@' \ LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \ diff --git a/tests/test-mbsnrtowcs.c b/tests/test-mbsnrtowcs.c index e3456b4e2b..2d0e6a7521 100644 --- a/tests/test-mbsnrtowcs.c +++ b/tests/test-mbsnrtowcs.c @@ -281,6 +281,72 @@ main (int argc, char *argv[]) } break; + case '5': + /* C or POSIX locale. */ + { + char input[] = "n/a"; + memset (&state, '\0', sizeof (mbstate_t)); + + src = input; + temp_state = state; + ret = mbsnrtowcs (NULL, &src, 4, unlimited ? BUFSIZE : 1, &temp_state); + ASSERT (ret == 3); + ASSERT (src == input); + ASSERT (mbsinit (&state)); + + src = input; + ret = mbsnrtowcs (buf, &src, 4, unlimited ? BUFSIZE : 1, &state); + ASSERT (ret == (unlimited ? 3 : 1)); + ASSERT (src == (unlimited ? NULL : input + 1)); + ASSERT (buf[0] == 'n'); + if (unlimited) + { + ASSERT (buf[1] == '/'); + ASSERT (buf[2] == 'a'); + ASSERT (buf[3] == 0); + ASSERT (buf[4] == (wchar_t) 0xBADFACE); + } + else + ASSERT (buf[1] == (wchar_t) 0xBADFACE); + ASSERT (mbsinit (&state)); + } + { + int c; + char input[2]; + + memset (&state, '\0', sizeof (mbstate_t)); + for (c = 0; c < 0x100; c++) + if (c != 0) + { + /* We are testing all nonnull bytes. */ + input[0] = c; + input[1] = '\0'; + + src = input; + ret = mbsnrtowcs (NULL, &src, 2, unlimited ? BUFSIZE : 1, &state); + ASSERT (ret == 1); + ASSERT (src == input); + ASSERT (mbsinit (&state)); + + buf[0] = buf[1] = (wchar_t) 0xBADFACE; + src = input; + ret = mbsnrtowcs (buf, &src, 2, unlimited ? BUFSIZE : 1, &state); + /* POSIX:2018 says: "In the POSIX locale an [EILSEQ] error + cannot occur since all byte values are valid characters." */ + ASSERT (ret == 1); + ASSERT (src == (unlimited ? NULL : input + 1)); + if (c < 0x80) + /* c is an ASCII character. */ + ASSERT (buf[0] == c); + else + /* On most platforms, the bytes 0x80..0xFF map to U+0080..U+00FF. + But on musl libc, the bytes 0x80..0xFF map to U+DF80..U+DFFF. */ + ASSERT (buf[0] == (btowc (c) == 0xDF00 + c ? btowc (c) : c)); + ASSERT (mbsinit (&state)); + } + } + break; + default: return 1; } diff --git a/tests/test-mbsnrtowcs5.sh b/tests/test-mbsnrtowcs5.sh new file mode 100755 index 0000000000..b121fb1a69 --- /dev/null +++ b/tests/test-mbsnrtowcs5.sh @@ -0,0 +1,9 @@ +#!/bin/sh + +# Test whether the POSIX locale has encoding errors. +LC_ALL=C \ +${CHECKER} ./test-mbsnrtowcs${EXEEXT} 5 || exit 1 +LC_ALL=POSIX \ +${CHECKER} ./test-mbsnrtowcs${EXEEXT} 5 || exit 1 + +exit 0 -- 2.34.1
>From 7595a8817e03c1366817e69aa74263efa2ca4979 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Fri, 31 Mar 2023 00:27:20 +0200 Subject: [PATCH 7/8] mbstowcs: New module. * lib/stdlib.in.h (mbstowcs): New declaration. * lib/mbstowcs.c: New file, based on lib/mbstoc32s.c. * m4/mbstowcs.m4: New file. * m4/stdlib_h.m4 (gl_STDLIB_H): Test whether mbstowcs is declared. (gl_STDLIB_H_REQUIRE_DEFAULTS): Initialize GNULIB_MBSTOWCS. (gl_STDLIB_H_DEFAULTS): Initialize REPLACE_MBSTOWCS. * modules/stdlib (Makefile.am): Substitute GNULIB_MBSTOWCS, REPLACE_MBSTOWCS. * modules/mbstowcs: New file. * tests/test-stdlib-c++.cc (mbstowcs): Check signature. * doc/posix-functions/mbstowcs.texi: Mention the C locale behaviour bug and the new module. --- ChangeLog | 16 +++++++++++++++ doc/posix-functions/mbstowcs.texi | 6 +++++- lib/mbstowcs.c | 33 +++++++++++++++++++++++++++++++ lib/stdlib.in.h | 30 ++++++++++++++++++++++++++++ m4/mbstowcs.m4 | 21 ++++++++++++++++++++ m4/stdlib_h.m4 | 8 +++++--- modules/mbstowcs | 33 +++++++++++++++++++++++++++++++ modules/stdlib | 2 ++ tests/test-stdlib-c++.cc | 5 +++++ 9 files changed, 150 insertions(+), 4 deletions(-) create mode 100644 lib/mbstowcs.c create mode 100644 m4/mbstowcs.m4 create mode 100644 modules/mbstowcs diff --git a/ChangeLog b/ChangeLog index 4afba58a3c..bc7372479c 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,19 @@ +2023-03-30 Bruno Haible <br...@clisp.org> + + mbstowcs: New module. + * lib/stdlib.in.h (mbstowcs): New declaration. + * lib/mbstowcs.c: New file, based on lib/mbstoc32s.c. + * m4/mbstowcs.m4: New file. + * m4/stdlib_h.m4 (gl_STDLIB_H): Test whether mbstowcs is declared. + (gl_STDLIB_H_REQUIRE_DEFAULTS): Initialize GNULIB_MBSTOWCS. + (gl_STDLIB_H_DEFAULTS): Initialize REPLACE_MBSTOWCS. + * modules/stdlib (Makefile.am): Substitute GNULIB_MBSTOWCS, + REPLACE_MBSTOWCS. + * modules/mbstowcs: New file. + * tests/test-stdlib-c++.cc (mbstowcs): Check signature. + * doc/posix-functions/mbstowcs.texi: Mention the C locale behaviour bug + and the new module. + 2023-03-30 Bruno Haible <br...@clisp.org> mbsnrtowcs: Fix behaviour in the C locale. diff --git a/doc/posix-functions/mbstowcs.texi b/doc/posix-functions/mbstowcs.texi index 5d7dc4ca62..a695edc2af 100644 --- a/doc/posix-functions/mbstowcs.texi +++ b/doc/posix-functions/mbstowcs.texi @@ -4,10 +4,14 @@ POSIX specification:@* @url{https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbstowcs.html} -Gnulib module: --- +Gnulib module: mbstowcs Portability problems fixed by Gnulib: @itemize +@item +In the C or POSIX locales, this function can return @code{(size_t) -1} +and set @code{errno} to @code{EILSEQ}: +glibc 2.35. @end itemize Portability problems not fixed by Gnulib: diff --git a/lib/mbstowcs.c b/lib/mbstowcs.c new file mode 100644 index 0000000000..e32d9acf88 --- /dev/null +++ b/lib/mbstowcs.c @@ -0,0 +1,33 @@ +/* Convert string to wide string. + Copyright (C) 2020-2023 Free Software Foundation, Inc. + Written by Bruno Haible <br...@clisp.org>, 2020. + + This file is free software: you can redistribute it and/or modify + it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + This file is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +#include <config.h> + +/* Specification. */ +#include <stdlib.h> + +#include <string.h> +#include <wchar.h> + +size_t +mbstowcs (wchar_t *dest, const char *src, size_t len) +{ + mbstate_t state; + + memset (&state, '\0', sizeof (mbstate_t)); + return mbsrtowcs (dest, &src, len, &state); +} diff --git a/lib/stdlib.in.h b/lib/stdlib.in.h index a91f4e23d6..4ecfc96a6f 100644 --- a/lib/stdlib.in.h +++ b/lib/stdlib.in.h @@ -589,6 +589,36 @@ _GL_WARN_ON_USE (malloc, "malloc is not POSIX compliant everywhere - " # endif #endif +/* Convert a string to a wide string. */ +#if @GNULIB_MBSTOWCS@ +# if @REPLACE_MBSTOWCS@ +# if !(defined __cplusplus && defined GNULIB_NAMESPACE) +# undef mbstowcs +# define mbstowcs rpl_mbstowcs +# endif +_GL_FUNCDECL_RPL (mbstowcs, size_t, + (wchar_t *restrict dest, const char *restrict src, + size_t len) + _GL_ARG_NONNULL ((2))); +_GL_CXXALIAS_RPL (mbstowcs, size_t, + (wchar_t *restrict dest, const char *restrict src, + size_t len)); +# else +_GL_CXXALIAS_SYS (mbstowcs, size_t, + (wchar_t *restrict dest, const char *restrict src, + size_t len)); +# endif +# if __GLIBC__ >= 2 +_GL_CXXALIASWARN (mbstowcs); +# endif +#elif defined GNULIB_POSIXCHECK +# undef mbstowcs +# if HAVE_RAW_DECL_MBSTOWCS +_GL_WARN_ON_USE (mbstowcs, "mbstowcs is unportable - " + "use gnulib module mbstowcs for portability"); +# endif +#endif + /* Convert a multibyte character to a wide character. */ #if @GNULIB_MBTOWC@ # if @REPLACE_MBTOWC@ diff --git a/m4/mbstowcs.m4 b/m4/mbstowcs.m4 new file mode 100644 index 0000000000..c66e804f80 --- /dev/null +++ b/m4/mbstowcs.m4 @@ -0,0 +1,21 @@ +# mbstowcs.m4 serial 1 +dnl Copyright (C) 2023 Free Software Foundation, Inc. +dnl This file is free software; the Free Software Foundation +dnl gives unlimited permission to copy and/or distribute it, +dnl with or without modifications, as long as this notice is preserved. + +AC_DEFUN([gl_FUNC_MBSTOWCS], +[ + AC_REQUIRE([gl_STDLIB_H_DEFAULTS]) + + gl_MBRTOWC_C_LOCALE + case "$gl_cv_func_mbrtowc_C_locale_sans_EILSEQ" in + *yes) ;; + *) REPLACE_MBSTOWCS=1 ;; + esac +]) + +# Prerequisites of lib/mbstowcs.c. +AC_DEFUN([gl_PREREQ_MBSTOWCS], [ + : +]) diff --git a/m4/stdlib_h.m4 b/m4/stdlib_h.m4 index 249ef65722..ac28ed9efc 100644 --- a/m4/stdlib_h.m4 +++ b/m4/stdlib_h.m4 @@ -1,4 +1,4 @@ -# stdlib_h.m4 serial 71 +# stdlib_h.m4 serial 72 dnl Copyright (C) 2007-2023 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, @@ -24,8 +24,8 @@ AC_DEFUN_ONCE([gl_STDLIB_H] #endif ]], [_Exit aligned_alloc atoll canonicalize_file_name free getloadavg getprogname getsubopt grantpt - initstate initstate_r mbtowc mkdtemp mkostemp mkostemps mkstemp mkstemps - posix_memalign posix_openpt ptsname ptsname_r qsort_r + initstate initstate_r mbstowcs mbtowc mkdtemp mkostemp mkostemps mkstemp + mkstemps posix_memalign posix_openpt ptsname ptsname_r qsort_r random random_r reallocarray realpath rpmatch secure_getenv setenv setstate setstate_r srandom srandom_r strtod strtol strtold strtoll strtoul strtoull unlockpt unsetenv]) @@ -78,6 +78,7 @@ AC_DEFUN([gl_STDLIB_H_REQUIRE_DEFAULTS] gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_GRANTPT]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MALLOC_GNU]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MALLOC_POSIX]) + gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MBSTOWCS]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MBTOWC]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MKDTEMP]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MKOSTEMP]) @@ -180,6 +181,7 @@ AC_DEFUN([gl_STDLIB_H_DEFAULTS] REPLACE_INITSTATE=0; AC_SUBST([REPLACE_INITSTATE]) REPLACE_MALLOC_FOR_MALLOC_GNU=0; AC_SUBST([REPLACE_MALLOC_FOR_MALLOC_GNU]) REPLACE_MALLOC_FOR_MALLOC_POSIX=0; AC_SUBST([REPLACE_MALLOC_FOR_MALLOC_POSIX]) + REPLACE_MBSTOWCS=0; AC_SUBST([REPLACE_MBSTOWCS]) REPLACE_MBTOWC=0; AC_SUBST([REPLACE_MBTOWC]) REPLACE_MKOSTEMP=0; AC_SUBST([REPLACE_MKOSTEMP]) REPLACE_MKOSTEMPS=0; AC_SUBST([REPLACE_MKOSTEMPS]) diff --git a/modules/mbstowcs b/modules/mbstowcs new file mode 100644 index 0000000000..44ba43d977 --- /dev/null +++ b/modules/mbstowcs @@ -0,0 +1,33 @@ +Description: +mbstowcs() function: convert string to wide string. + +Files: +lib/mbstowcs.c +m4/mbstowcs.m4 +m4/mbrtowc.m4 + +Depends-on: +stdlib +mbsrtowcs [test $REPLACE_MBSTOWCS = 1] + +configure.ac: +gl_FUNC_MBSTOWCS +gl_CONDITIONAL([GL_COND_OBJ_MBSTOWCS], [test $REPLACE_MBSTOWCS = 1]) +AM_COND_IF([GL_COND_OBJ_MBSTOWCS], [ + gl_PREREQ_MBSTOWCS +]) +gl_STDLIB_MODULE_INDICATOR([mbstowcs]) + +Makefile.am: +if GL_COND_OBJ_MBSTOWCS +lib_SOURCES += mbstowcs.c +endif + +Include: +<stdlib.h> + +License: +LGPLv2+ + +Maintainer: +all diff --git a/modules/stdlib b/modules/stdlib index bafeb214ee..efea327794 100644 --- a/modules/stdlib +++ b/modules/stdlib @@ -47,6 +47,7 @@ stdlib.h: stdlib.in.h $(top_builddir)/config.status $(CXXDEFS_H) \ -e 's/@''GNULIB_GRANTPT''@/$(GNULIB_GRANTPT)/g' \ -e 's/@''GNULIB_MALLOC_GNU''@/$(GNULIB_MALLOC_GNU)/g' \ -e 's/@''GNULIB_MALLOC_POSIX''@/$(GNULIB_MALLOC_POSIX)/g' \ + -e 's/@''GNULIB_MBSTOWCS''@/$(GNULIB_MBSTOWCS)/g' \ -e 's/@''GNULIB_MBTOWC''@/$(GNULIB_MBTOWC)/g' \ -e 's/@''GNULIB_MKDTEMP''@/$(GNULIB_MKDTEMP)/g' \ -e 's/@''GNULIB_MKOSTEMP''@/$(GNULIB_MKOSTEMP)/g' \ @@ -140,6 +141,7 @@ stdlib.h: stdlib.in.h $(top_builddir)/config.status $(CXXDEFS_H) \ -e 's|@''REPLACE_INITSTATE''@|$(REPLACE_INITSTATE)|g' \ -e 's|@''REPLACE_MALLOC_FOR_MALLOC_GNU''@|$(REPLACE_MALLOC_FOR_MALLOC_GNU)|g' \ -e 's|@''REPLACE_MALLOC_FOR_MALLOC_POSIX''@|$(REPLACE_MALLOC_FOR_MALLOC_POSIX)|g' \ + -e 's|@''REPLACE_MBSTOWCS''@|$(REPLACE_MBSTOWCS)|g' \ -e 's|@''REPLACE_MBTOWC''@|$(REPLACE_MBTOWC)|g' \ -e 's|@''REPLACE_MKOSTEMP''@|$(REPLACE_MKOSTEMP)|g' \ -e 's|@''REPLACE_MKOSTEMPS''@|$(REPLACE_MKOSTEMPS)|g' \ diff --git a/tests/test-stdlib-c++.cc b/tests/test-stdlib-c++.cc index 184437c5fd..7683afd154 100644 --- a/tests/test-stdlib-c++.cc +++ b/tests/test-stdlib-c++.cc @@ -73,6 +73,11 @@ SIGNATURE_CHECK (GNULIB_NAMESPACE::mbtowc, int, (wchar_t *, const char *, size_t)); #endif +#if GNULIB_TEST_MBSTOWCS +SIGNATURE_CHECK (GNULIB_NAMESPACE::mbstowcs, size_t, + (wchar_t *, const char *, size_t)); +#endif + #if GNULIB_TEST_MKDTEMP SIGNATURE_CHECK (GNULIB_NAMESPACE::mkdtemp, char *, (char *)); #endif -- 2.34.1
From e5b2b726e2720cb6c13fde84afd1514aca9f03d6 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Fri, 31 Mar 2023 00:30:39 +0200 Subject: [PATCH 8/8] mbstowcs: Add tests. * tests/test-mbstowcs1.sh: New file, based on tests/test-mbsrtowcs1.sh. * tests/test-mbstowcs2.sh: New file, based on tests/test-mbsrtowcs2.sh. * tests/test-mbstowcs3.sh: New file, based on tests/test-mbsrtowcs3.sh. * tests/test-mbstowcs4.sh: New file, based on tests/test-mbsrtowcs4.sh. * tests/test-mbstowcs5.sh: New file, based on tests/test-mbsrtowcs5.sh. * tests/test-mbstowcs.c: New file, based on tests/test-mbsrtowcs.c. * modules/mbstowcs-tests: New file, based on modules/mbsrtowcs-tests. --- ChangeLog | 9 ++ modules/mbstowcs-tests | 37 ++++++ tests/test-mbstowcs.c | 254 ++++++++++++++++++++++++++++++++++++++++ tests/test-mbstowcs1.sh | 15 +++ tests/test-mbstowcs2.sh | 15 +++ tests/test-mbstowcs3.sh | 15 +++ tests/test-mbstowcs4.sh | 15 +++ tests/test-mbstowcs5.sh | 9 ++ 8 files changed, 369 insertions(+) create mode 100644 modules/mbstowcs-tests create mode 100644 tests/test-mbstowcs.c create mode 100755 tests/test-mbstowcs1.sh create mode 100755 tests/test-mbstowcs2.sh create mode 100755 tests/test-mbstowcs3.sh create mode 100755 tests/test-mbstowcs4.sh create mode 100755 tests/test-mbstowcs5.sh diff --git a/ChangeLog b/ChangeLog index bc7372479c..e3d94c501e 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,14 @@ 2023-03-30 Bruno Haible <br...@clisp.org> + mbstowcs: Add tests. + * tests/test-mbstowcs1.sh: New file, based on tests/test-mbsrtowcs1.sh. + * tests/test-mbstowcs2.sh: New file, based on tests/test-mbsrtowcs2.sh. + * tests/test-mbstowcs3.sh: New file, based on tests/test-mbsrtowcs3.sh. + * tests/test-mbstowcs4.sh: New file, based on tests/test-mbsrtowcs4.sh. + * tests/test-mbstowcs5.sh: New file, based on tests/test-mbsrtowcs5.sh. + * tests/test-mbstowcs.c: New file, based on tests/test-mbsrtowcs.c. + * modules/mbstowcs-tests: New file, based on modules/mbsrtowcs-tests. + mbstowcs: New module. * lib/stdlib.in.h (mbstowcs): New declaration. * lib/mbstowcs.c: New file, based on lib/mbstoc32s.c. diff --git a/modules/mbstowcs-tests b/modules/mbstowcs-tests new file mode 100644 index 0000000000..a780be2c4e --- /dev/null +++ b/modules/mbstowcs-tests @@ -0,0 +1,37 @@ +Files: +tests/test-mbstowcs1.sh +tests/test-mbstowcs2.sh +tests/test-mbstowcs3.sh +tests/test-mbstowcs4.sh +tests/test-mbstowcs5.sh +tests/test-mbstowcs.c +tests/signature.h +tests/macros.h +m4/locale-fr.m4 +m4/locale-ja.m4 +m4/locale-zh.m4 +m4/codeset.m4 + +Depends-on: +mbrtowc +mbsinit +wctob +setlocale + +configure.ac: +gt_LOCALE_FR +gt_LOCALE_FR_UTF8 +gt_LOCALE_JA +gt_LOCALE_ZH_CN + +Makefile.am: +TESTS += \ + test-mbstowcs1.sh test-mbstowcs2.sh test-mbstowcs3.sh test-mbstowcs4.sh \ + test-mbstowcs5.sh +TESTS_ENVIRONMENT += \ + LOCALE_FR='@LOCALE_FR@' \ + LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \ + LOCALE_JA='@LOCALE_JA@' \ + LOCALE_ZH_CN='@LOCALE_ZH_CN@' +check_PROGRAMS += test-mbstowcs +test_mbstowcs_LDADD = $(LDADD) $(SETLOCALE_LIB) $(MBRTOWC_LIB) diff --git a/tests/test-mbstowcs.c b/tests/test-mbstowcs.c new file mode 100644 index 0000000000..a33511a07d --- /dev/null +++ b/tests/test-mbstowcs.c @@ -0,0 +1,254 @@ +/* Test of conversion of string to wide string. + Copyright (C) 2008-2023 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2008. */ + +#include <config.h> + +#include <stdlib.h> + +#include "signature.h" +SIGNATURE_CHECK (mbstowcs, size_t, (wchar_t *, const char *, size_t)); + +#include <locale.h> +#include <stdio.h> +#include <string.h> +#include <wchar.h> + +#include "macros.h" + +int +main (int argc, char *argv[]) +{ + wchar_t wc; + size_t ret; + + /* configure should already have checked that the locale is supported. */ + if (setlocale (LC_ALL, "") == NULL) + return 1; + + /* Test NUL byte input. */ + { + const char *src; + + src = ""; + ret = mbstowcs (NULL, src, 0); + ASSERT (ret == 0); + + src = ""; + ret = mbstowcs (NULL, src, 1); + ASSERT (ret == 0); + + wc = (wchar_t) 0xBADFACE; + src = ""; + ret = mbstowcs (&wc, src, 0); + ASSERT (ret == 0); + ASSERT (wc == (wchar_t) 0xBADFACE); + + wc = (wchar_t) 0xBADFACE; + src = ""; + ret = mbstowcs (&wc, src, 1); + ASSERT (ret == 0); + ASSERT (wc == 0); + } + + if (argc > 1) + { + int unlimited; + + for (unlimited = 0; unlimited < 2; unlimited++) + { + #define BUFSIZE 10 + wchar_t buf[BUFSIZE]; + const char *src; + + { + size_t i; + for (i = 0; i < BUFSIZE; i++) + buf[i] = (wchar_t) 0xBADFACE; + } + + switch (argv[1][0]) + { + case '1': + /* Locale encoding is ISO-8859-1 or ISO-8859-15. */ + { + char input[] = "B\374\337er"; /* "Büßer" */ + + src = input + 1; + ret = mbstowcs (NULL, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == 4); + + src = input + 1; + ret = mbstowcs (buf, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == (unlimited ? 4 : 1)); + ASSERT (wctob (buf[0]) == (unsigned char) '\374'); + if (unlimited) + { + ASSERT (wctob (buf[1]) == (unsigned char) '\337'); + ASSERT (buf[2] == 'e'); + ASSERT (buf[3] == 'r'); + ASSERT (buf[4] == 0); + ASSERT (buf[5] == (wchar_t) 0xBADFACE); + } + else + ASSERT (buf[1] == (wchar_t) 0xBADFACE); + } + break; + + case '2': + /* Locale encoding is UTF-8. */ + { + char input[] = "B\303\274\303\237er"; /* "Büßer" */ + + src = input + 1; + ret = mbstowcs (NULL, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == 4); + + src = input + 1; + ret = mbstowcs (buf, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == (unlimited ? 4 : 1)); + ASSERT (wctob (buf[0]) == EOF); + ASSERT (wctob (buf[1]) == EOF); + if (unlimited) + { + ASSERT (buf[2] == 'e'); + ASSERT (buf[3] == 'r'); + ASSERT (buf[4] == 0); + ASSERT (buf[5] == (wchar_t) 0xBADFACE); + } + else + ASSERT (buf[2] == (wchar_t) 0xBADFACE); + } + break; + + case '3': + /* Locale encoding is EUC-JP. */ + { + char input[] = "<\306\374\313\334\270\354>"; /* "<日本語>" */ + + src = input + 1; + ret = mbstowcs (NULL, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == 4); + + src = input + 1; + ret = mbstowcs (buf, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == (unlimited ? 4 : 1)); + ASSERT (wctob (buf[0]) == EOF); + ASSERT (wctob (buf[1]) == EOF); + ASSERT (wctob (buf[2]) == EOF); + if (unlimited) + { + ASSERT (buf[3] == '>'); + ASSERT (buf[4] == 0); + ASSERT (buf[5] == (wchar_t) 0xBADFACE); + } + else + ASSERT (buf[3] == (wchar_t) 0xBADFACE); + } + break; + + case '4': + /* Locale encoding is GB18030. */ + { + char input[] = "B\250\271\201\060\211\070er"; /* "Büßer" */ + + src = input + 1; + ret = mbstowcs (NULL, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == 4); + + src = input + 1; + ret = mbstowcs (buf, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == (unlimited ? 4 : 1)); + ASSERT (wctob (buf[0]) == EOF); + if (unlimited) + { + ASSERT (wctob (buf[1]) == EOF); + ASSERT (buf[2] == 'e'); + ASSERT (buf[3] == 'r'); + ASSERT (buf[4] == 0); + ASSERT (buf[5] == (wchar_t) 0xBADFACE); + } + else + ASSERT (buf[1] == (wchar_t) 0xBADFACE); + } + break; + + case '5': + /* C or POSIX locale. */ + { + char input[] = "n/a"; + + src = input; + ret = mbstowcs (NULL, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == 3); + + src = input; + ret = mbstowcs (buf, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == (unlimited ? 3 : 1)); + ASSERT (buf[0] == 'n'); + if (unlimited) + { + ASSERT (buf[1] == '/'); + ASSERT (buf[2] == 'a'); + ASSERT (buf[3] == 0); + ASSERT (buf[4] == (wchar_t) 0xBADFACE); + } + else + ASSERT (buf[1] == (wchar_t) 0xBADFACE); + } + { + int c; + char input[2]; + + for (c = 0; c < 0x100; c++) + if (c != 0) + { + /* We are testing all nonnull bytes. */ + input[0] = c; + input[1] = '\0'; + + src = input; + ret = mbstowcs (NULL, src, unlimited ? BUFSIZE : 1); + ASSERT (ret == 1); + + buf[0] = buf[1] = (wchar_t) 0xBADFACE; + src = input; + ret = mbstowcs (buf, src, unlimited ? BUFSIZE : 1); + /* POSIX:2018 says: "In the POSIX locale an [EILSEQ] error + cannot occur since all byte values are valid characters." */ + ASSERT (ret == 1); + if (c < 0x80) + /* c is an ASCII character. */ + ASSERT (buf[0] == c); + else + /* On most platforms, the bytes 0x80..0xFF map to U+0080..U+00FF. + But on musl libc, the bytes 0x80..0xFF map to U+DF80..U+DFFF. */ + ASSERT (buf[0] == (btowc (c) == 0xDF00 + c ? btowc (c) : c)); + } + } + break; + + default: + return 1; + } + } + + return 0; + } + + return 1; +} diff --git a/tests/test-mbstowcs1.sh b/tests/test-mbstowcs1.sh new file mode 100755 index 0000000000..57f82d80fc --- /dev/null +++ b/tests/test-mbstowcs1.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test in an ISO-8859-1 or ISO-8859-15 locale. +: "${LOCALE_FR=fr_FR}" +if test $LOCALE_FR = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no traditional french locale is installed" + else + echo "Skipping test: no traditional french locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_FR \ +${CHECKER} ./test-mbstowcs${EXEEXT} 1 diff --git a/tests/test-mbstowcs2.sh b/tests/test-mbstowcs2.sh new file mode 100755 index 0000000000..311b9b5f27 --- /dev/null +++ b/tests/test-mbstowcs2.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific UTF-8 locale is installed. +: "${LOCALE_FR_UTF8=fr_FR.UTF-8}" +if test $LOCALE_FR_UTF8 = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no french Unicode locale is installed" + else + echo "Skipping test: no french Unicode locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_FR_UTF8 \ +${CHECKER} ./test-mbstowcs${EXEEXT} 2 diff --git a/tests/test-mbstowcs3.sh b/tests/test-mbstowcs3.sh new file mode 100755 index 0000000000..ebaada5599 --- /dev/null +++ b/tests/test-mbstowcs3.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific EUC-JP locale is installed. +: "${LOCALE_JA=ja_JP}" +if test $LOCALE_JA = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no traditional japanese locale is installed" + else + echo "Skipping test: no traditional japanese locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_JA \ +${CHECKER} ./test-mbstowcs${EXEEXT} 3 diff --git a/tests/test-mbstowcs4.sh b/tests/test-mbstowcs4.sh new file mode 100755 index 0000000000..86da562f02 --- /dev/null +++ b/tests/test-mbstowcs4.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific GB18030 locale is installed. +: "${LOCALE_ZH_CN=zh_CN.GB18030}" +if test $LOCALE_ZH_CN = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no transitional chinese locale is installed" + else + echo "Skipping test: no transitional chinese locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_ZH_CN \ +${CHECKER} ./test-mbstowcs${EXEEXT} 4 diff --git a/tests/test-mbstowcs5.sh b/tests/test-mbstowcs5.sh new file mode 100755 index 0000000000..ae2477ed01 --- /dev/null +++ b/tests/test-mbstowcs5.sh @@ -0,0 +1,9 @@ +#!/bin/sh + +# Test whether the POSIX locale has encoding errors. +LC_ALL=C \ +${CHECKER} ./test-mbstowcs${EXEEXT} 5 || exit 1 +LC_ALL=POSIX \ +${CHECKER} ./test-mbstowcs${EXEEXT} 5 || exit 1 + +exit 0 -- 2.34.1