These two patches add a module 'mbrtoc16', that implements the ISO C 23 function 'mbrtoc16'. This function is not particularly useful for GNU programs, as it returns UTF-16 code points. Instead, I'm adding it in order to understand the (size_t)(-3) return value of mbrtoc32, mbrtoc16, mbrtoc8.
It turns out that glibc, musl libc, and Windows implement this return value convention as described in the standard, whereas Android implements just the wrong way around. So, I'm now pretty confident to have understood this part of ISO C 23 correctly :) 2023-06-28 Bruno Haible <br...@clisp.org> mbrtoc16: Add tests. * tests/test-mbrtoc16.c: New file, based on tests/test-mbrtoc32.c. * tests/test-mbrtoc16-1.sh: New file, based on tests/test-mbrtoc32-1.sh. * tests/test-mbrtoc16-2.sh: New file, based on tests/test-mbrtoc32-2.sh. * tests/test-mbrtoc16-3.sh: New file, based on tests/test-mbrtoc32-3.sh. * tests/test-mbrtoc16-4.sh: New file, based on tests/test-mbrtoc32-4.sh. * tests/test-mbrtoc16-5.sh: New file, based on tests/test-mbrtoc32-5.sh. * tests/test-mbrtoc16-w32.c: New file, based on tests/test-mbrtoc32-w32.c. * tests/test-mbrtoc16-w32-1.sh: New file, based on tests/test-mbrtoc32-w32-1.sh. * tests/test-mbrtoc16-w32-2.sh: New file, based on tests/test-mbrtoc32-w32-2.sh. * tests/test-mbrtoc16-w32-3.sh: New file, based on tests/test-mbrtoc32-w32-3.sh. * tests/test-mbrtoc16-w32-4.sh: New file, based on tests/test-mbrtoc32-w32-4.sh. * tests/test-mbrtoc16-w32-5.sh: New file, based on tests/test-mbrtoc32-w32-5.sh. * tests/test-mbrtoc16-w32-6.sh: New file, based on tests/test-mbrtoc32-w32-6.sh. * tests/test-mbrtoc16-w32-7.sh: New file, based on tests/test-mbrtoc32-w32-7.sh. * modules/mbrtoc16-tests: New file, based on modules/mbrtoc32-tests. mbrtoc16: New module. * lib/uchar.in.h (mbrtoc16): New declaration. * lib/mbrtoc16.c: New file. * m4/mbrtoc16.m4: New file, based on m4/mbrtoc32.m4. * modules/mbrtoc16: New file. * m4/uchar_h.m4 (gl_UCHAR_H): Test whether mbrtoc16 is declared. (gl_UCHAR_H_REQUIRE_DEFAULTS): Initialize GNULIB_MBRTOC16. (gl_UCHAR_H_DEFAULTS): Initialize HAVE_MBRTOC16, REPLACE_MBRTOC16. * modules/uchar (Makefile.am): Substitute GNULIB_MBRTOC16, HAVE_MBRTOC16, REPLACE_MBRTOC16. * doc/posix-functions/mbrtoc16.texi: Mention the mbrtoc16 module and the mbsinit related limitation. * doc/posix-functions/mbsinit.texi: Mention the mbrtoc16 related limitation.
From 5ae6f38b4a1e0cda9a7671764953ab8dc6326a1e Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Wed, 28 Jun 2023 18:28:23 +0200 Subject: [PATCH 1/2] mbrtoc16: New module. * lib/uchar.in.h (mbrtoc16): New declaration. * lib/mbrtoc16.c: New file. * m4/mbrtoc16.m4: New file, based on m4/mbrtoc32.m4. * modules/mbrtoc16: New file. * m4/uchar_h.m4 (gl_UCHAR_H): Test whether mbrtoc16 is declared. (gl_UCHAR_H_REQUIRE_DEFAULTS): Initialize GNULIB_MBRTOC16. (gl_UCHAR_H_DEFAULTS): Initialize HAVE_MBRTOC16, REPLACE_MBRTOC16. * modules/uchar (Makefile.am): Substitute GNULIB_MBRTOC16, HAVE_MBRTOC16, REPLACE_MBRTOC16. * doc/posix-functions/mbrtoc16.texi: Mention the mbrtoc16 module and the mbsinit related limitation. * doc/posix-functions/mbsinit.texi: Mention the mbrtoc16 related limitation. --- ChangeLog | 17 ++ doc/posix-functions/mbrtoc16.texi | 41 ++- doc/posix-functions/mbsinit.texi | 3 + lib/mbrtoc16.c | 214 ++++++++++++++++ lib/uchar.in.h | 32 +++ m4/mbrtoc16.m4 | 413 ++++++++++++++++++++++++++++++ m4/uchar_h.m4 | 7 +- modules/mbrtoc16 | 46 ++++ modules/uchar | 3 + 9 files changed, 771 insertions(+), 5 deletions(-) create mode 100644 lib/mbrtoc16.c create mode 100644 m4/mbrtoc16.m4 create mode 100644 modules/mbrtoc16 diff --git a/ChangeLog b/ChangeLog index 878e99cfcb..75ebb6d401 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,20 @@ +2023-06-28 Bruno Haible <br...@clisp.org> + + mbrtoc16: New module. + * lib/uchar.in.h (mbrtoc16): New declaration. + * lib/mbrtoc16.c: New file. + * m4/mbrtoc16.m4: New file, based on m4/mbrtoc32.m4. + * modules/mbrtoc16: New file. + * m4/uchar_h.m4 (gl_UCHAR_H): Test whether mbrtoc16 is declared. + (gl_UCHAR_H_REQUIRE_DEFAULTS): Initialize GNULIB_MBRTOC16. + (gl_UCHAR_H_DEFAULTS): Initialize HAVE_MBRTOC16, REPLACE_MBRTOC16. + * modules/uchar (Makefile.am): Substitute GNULIB_MBRTOC16, + HAVE_MBRTOC16, REPLACE_MBRTOC16. + * doc/posix-functions/mbrtoc16.texi: Mention the mbrtoc16 module and the + mbsinit related limitation. + * doc/posix-functions/mbsinit.texi: Mention the mbrtoc16 related + limitation. + 2023-06-28 Bruno Haible <br...@clisp.org> c32*: Update comment. diff --git a/doc/posix-functions/mbrtoc16.texi b/doc/posix-functions/mbrtoc16.texi index f1fd3273a5..ed568d593f 100644 --- a/doc/posix-functions/mbrtoc16.texi +++ b/doc/posix-functions/mbrtoc16.texi @@ -2,15 +2,50 @@ @section @code{mbrtoc16} @findex mbrtoc16 -Gnulib module: --- +Gnulib module: mbrtoc16 Portability problems fixed by Gnulib: @itemize +@item +This function is missing on most non-glibc platforms: +glibc 2.15, macOS 11.1, FreeBSD 6.4, NetBSD 9.0, OpenBSD 6.7, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.3, Cygwin 2.9, mingw, MSVC 9, Android 4.4. +@item +This function may crash when the first argument is NULL on some platforms: +@c https://sourceware.org/bugzilla/show_bug.cgi?id=28898 +glibc 2.36. +@item +In the C or POSIX locales, this function can return @code{(size_t) -1} +and set @code{errno} to @code{EILSEQ}: +glibc 2.36. +@item +This function returns 0 instead of @code{(size_t) -2} when the input +is empty: +glibc 2.19, Android 11. +@item +This function returns the total number of bytes that make up the multibyte +character, not the number of bytes that were needed to complete the multibyte +character, on some platforms: +mingw. +@item +This function returns @code{(size_t) -3} instead of a byte count when it +has stored a high surrogate, and returns a byte count instead of +@code{(size_t) -3} when it has stored a low surrogate, on some platforms: +Android. +@item +This function does not recognize multibyte sequences that @code{mbrtowc} +recognizes on some platforms: +FreeBSD 13.2, Solaris 11.4, MSVC 14. +@c For MSVC this is because it assumes that the input is always UTF-8 encoded. +@c See https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/mbrtoc16-mbrtoc323 @end itemize Portability problems not fixed by Gnulib: @itemize @item -This function is missing on most non-glibc platforms: -glibc 2.15, macOS 11.1, FreeBSD 6.4, NetBSD 9.0, OpenBSD 6.7, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.3, Cygwin 2.9, mingw, MSVC 9, Android 4.4. +@c https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272218 +After @code{mbrtoc16} returns a @code{char16_t} value, @code{mbsinit} +cannot be used to determine whether the function is ready to return +another @code{char16_t} value. To do so, instead call @code{mbrtoc16} +again, with an appropriately incremented @code{const char *} argument +and an appropriately decremented @code{size_t} argument. @end itemize diff --git a/doc/posix-functions/mbsinit.texi b/doc/posix-functions/mbsinit.texi index 84658c6ffe..af33397a8a 100644 --- a/doc/posix-functions/mbsinit.texi +++ b/doc/posix-functions/mbsinit.texi @@ -18,4 +18,7 @@ Portability problems not fixed by Gnulib: @itemize +@item +@c https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272218 +This function is not useful after calls to @code{mbrtoc16} or @code{mbrtoc8}. @end itemize diff --git a/lib/mbrtoc16.c b/lib/mbrtoc16.c new file mode 100644 index 0000000000..eb73e7d447 --- /dev/null +++ b/lib/mbrtoc16.c @@ -0,0 +1,214 @@ +/* Convert multibyte character and return next 16-bit wide character. + Copyright (C) 2020-2023 Free Software Foundation, Inc. + + This file is free software: you can redistribute it and/or modify + it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + This file is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2023. */ + +#include <config.h> + +/* Specification. */ +#include <uchar.h> + +#include <stdlib.h> +#include <wchar.h> + +/* We must find room for a two-bytes char16_t in an mbstate_t, without + interfering with the existing use of the mbstate_t in mbrtoc32. */ +static_assert (sizeof (mbstate_t) >= 4); + +#if GNULIB_defined_mbstate_t /* AIX, IRIX */ +/* mbstate_t has at least 4 bytes. They are used as coded in + gnulib/lib/mbrtowc.c. */ +# define SET_EXTRA_STATE(ps, c16) \ + (((char *)(ps))[0] = 8, \ + ((char *)(ps))[1] = (unsigned char) ((c16) >> 8), \ + ((char *)(ps))[2] = (unsigned char) ((c16) & 0xff)) +# define GET_EXTRA_STATE(ps) \ + (((char *)(ps))[0] == 8 \ + ? ((unsigned char) ((char *)(ps))[1] << 8) | (unsigned char) ((char *)(ps))[2] \ + : 0) +# define RESET_EXTRA_STATE(ps) \ + (((char *)(ps))[0] = 0) +#elif __GLIBC__ >= 2 +/* mbstate_t is defined in <bits/types/__mbstate_t.h>. + For more details, see glibc/iconv/skeleton.c. */ +# define SET_EXTRA_STATE(ps, c16) \ + ((ps)->__count |= (c16 << 16)) +# define GET_EXTRA_STATE(ps) \ + (((unsigned int) (ps)->__count) >> 16) +# define RESET_EXTRA_STATE(ps) \ + ((ps)->__count &= 0xffff) +#elif (defined __APPLE__ && defined __MACH__) || defined __FreeBSD__ || defined __NetBSD__ || defined __OpenBSD__ || defined __minix +/* macOS, FreeBSD, NetBSD, OpenBSD, Minix */ +/* On macOS, mbstate_t is defined in <machine/_types.h>. + It is an opaque aligned 128-byte struct, of which at most the first + 20 bytes are used (the members are at most: 2x wchar_t, 2x int, 4x char). + For more details, see the __mbsinit implementations in + Libc-<version>/locale/FreeBSD/ + {ascii,none,euc,mskanji,big5,gb2312,gbk,gb18030,utf8,utf2}.c. */ +/* On FreeBSD, mbstate_t is defined in src/sys/sys/_types.h. + It is an opaque aligned 128-byte struct, of which at most the first + 20 bytes are used (the members are at most: 2x wchar_t, 2x int, 4x char). + For more details, see the __mbsinit implementations in + src/lib/libc/locale/ + {ascii,none,euc,mskanji,big5,gb2312,gbk,gb18030,utf8}.c. */ +/* On NetBSD, mbstate_t is defined in src/sys/sys/ansi.h. + It is an opaque aligned 128-byte struct, of which at most the first + 24 bytes are used (the members are at most: 3x int, 12x char). + For more details, see the *State types in + src/lib/libc/citrus/modules/citrus_*.c. */ +/* On OpenBSD, mbstate_t is defined in src/sys/sys/_types.h. + It is an opaque aligned 128-byte struct, of which at most the first + 12 bytes are used (the members are at most: 2x wchar_t, 1x int). + For more details, see src/lib/libc/citrus/citrus_*.c. */ +/* Minix has borrowed its mbstate_t type and mbrtowc implementation from the + BSDs. */ +# define SET_EXTRA_STATE(ps, c16) \ + (((unsigned short *)(ps))[16] = (c16)) +# define GET_EXTRA_STATE(ps) \ + (((unsigned short *)(ps))[16]) +# define RESET_EXTRA_STATE(ps) \ + (((unsigned short *)(ps))[16] = 0) +#elif defined __sun /* Solaris */ +/* On Solaris, mbstate_t is defined in <wchar_impl.h>. + It is an opaque aligned 24-byte or 32-byte struct, of which at most the first + 20 bytes are used (the members are at most: 2x wchar_t, 2x int, 4x char). + For more details, see the *State types in + illumos-gate/usr/src/lib/libc/port/locale/ + {none,euc,mskanji,big5,gb2312,gbk,gb18030,utf8}.c. */ +# define SET_EXTRA_STATE(ps, c16) \ + (((unsigned short *)(ps))[10] = (c16)) +# define GET_EXTRA_STATE(ps) \ + (((unsigned short *)(ps))[10]) +# define RESET_EXTRA_STATE(ps) \ + (((unsigned short *)(ps))[10] = 0) +#elif defined __CYGWIN__ +/* On Cygwin, mbstate_t is defined in <sys/_types.h>. + For more details, see newlib/libc/stdlib/mbtowc_r.c and + winsup/cygwin/strfuncs.cc. */ +# define SET_EXTRA_STATE(ps, c16) \ + ((ps)->__count = 8, \ + (ps)->__value.__wch = (c16)) +# define GET_EXTRA_STATE(ps) \ + ((ps)->__count == 8 ? (ps)->__value.__wch : 0) +# define RESET_EXTRA_STATE(ps) \ + ((ps)->__count = 0) +#elif defined _WIN32 && !defined __CYGWIN__ /* Native Windows. */ +/* MSVC defines 'mbstate_t' as an aligned 8-byte struct. + On mingw, 'mbstate_t' is sometimes defined as 'int', sometimes defined + as an aligned 8-byte struct, of which the first 4 bytes matter. */ +# define SET_EXTRA_STATE(ps, c16) \ + (((char *)(ps))[3] = 4, \ + ((unsigned short *)(ps))[0] = (c16)) +# define GET_EXTRA_STATE(ps) \ + (((char *)(ps))[3] == 4 \ + ? ((unsigned short *)(ps))[0] \ + : 0) +# define RESET_EXTRA_STATE(ps) \ + (((char *)(ps))[3] = 0, \ + ((unsigned short *)(ps))[0] = 0) +#elif defined __ANDROID__ /* Android */ +/* Android defines 'mbstate_t' in <bits/mbstate_t.h>. + It is an opaque 4-byte or 8-byte struct. + For more details, see + bionic/libc/private/bionic_mbstate.h + bionic/libc/bionic/mbrtoc32.cpp + bionic/libc/bionic/mbrtoc16.cpp + */ +# define SET_EXTRA_STATE(ps, c16) \ + (((char *)(ps))[3] = 4, \ + ((char *)(ps))[0] = (unsigned char) ((c16) & 0xff), \ + ((char *)(ps))[1] = (unsigned char) ((c16) >> 8)) +# define GET_EXTRA_STATE(ps) \ + (((char *)(ps))[3] == 4 \ + ? ((unsigned char) ((char *)(ps))[1] << 8) | (unsigned char) ((char *)(ps))[0] \ + : 0) +# define RESET_EXTRA_STATE(ps) \ + (((char *)(ps))[0] = ((char *)(ps))[1] = ((char *)(ps))[2] = ((char *)(ps))[3] = 0) +#else +/* This is just a wild guess, for other platforms. It likely causes unit test + failures. */ +# define SET_EXTRA_STATE(ps, c16) \ + (((char *)(ps))[1] = (unsigned char) ((c16) >> 8), \ + ((char *)(ps))[2] = (unsigned char) ((c16) & 0xff)) +# define GET_EXTRA_STATE(ps) \ + (((unsigned char) ((char *)(ps))[1] << 8) | (unsigned char) ((char *)(ps))[2]) +# define RESET_EXTRA_STATE(ps) \ + (((char *)(ps))[1] = ((char *)(ps))[2] = 0) +#endif + +static mbstate_t internal_state; + +size_t +mbrtoc16 (char16_t *pwc, const char *s, size_t n, mbstate_t *ps) +#undef mbrtoc16 +{ + /* It's simpler to handle the case s == NULL upfront, than to worry about + this case later, before every test of pwc and n. */ + if (s == NULL) + { + pwc = NULL; + s = ""; + n = 1; + } + + if (ps == NULL) + ps = &internal_state; + + if (GET_EXTRA_STATE (ps) == 0) + { + if (n == 0) + return (size_t) -2; + + char32_t c32; + size_t ret = mbrtoc32 (&c32, s, n, ps); + if (ret == (size_t)(-1) || ret == (size_t)(-2)) + ; + else if (ret == (size_t)(-3)) + { + /* When mbrtoc32 returns several char32_t values for a single + multibyte character, they are all in the Unicode BMP range. */ + if (c32 >= 0x10000) + abort (); + if (pwc != NULL) + *pwc = c32; + } + else if (c32 < 0x10000) + { + if (pwc != NULL) + *pwc = c32; + } + else + { + if (c32 >= 0x110000) + abort (); + /* Decompose a Unicode character into a high surrogate and a low + surrogate. */ + char16_t surr1 = 0xd800 + ((c32 - 0x10000) >> 10); + char16_t surr2 = 0xdc00 + ((c32 - 0x10000) & 0x3ff); + if (pwc != NULL) + *pwc = surr1; + SET_EXTRA_STATE (ps, surr2); + } + return ret; + } + else + { + if (pwc != NULL) + *pwc = GET_EXTRA_STATE (ps); + RESET_EXTRA_STATE (ps); + return (size_t)(-3); + } +} diff --git a/lib/uchar.in.h b/lib/uchar.in.h index acf730c8ca..3ead6f5bce 100644 --- a/lib/uchar.in.h +++ b/lib/uchar.in.h @@ -567,6 +567,38 @@ _GL_WARN_ON_USE (mbrtoc32, "mbrtoc32 is not portable - " #endif +/* Converts a multibyte character and returns the next 16-bit wide + character. */ +#if @GNULIB_MBRTOC16@ +# if @REPLACE_MBRTOC16@ +# if !(defined __cplusplus && defined GNULIB_NAMESPACE) +# undef mbrtoc16 +# define mbrtoc16 rpl_mbrtoc16 +# endif +_GL_FUNCDECL_RPL (mbrtoc16, size_t, + (char16_t *pc, const char *s, size_t n, mbstate_t *ps)); +_GL_CXXALIAS_RPL (mbrtoc16, size_t, + (char16_t *pc, const char *s, size_t n, mbstate_t *ps)); +# else +# if !@HAVE_MBRTOC32@ +_GL_FUNCDECL_SYS (mbrtoc16, size_t, + (char16_t *pc, const char *s, size_t n, mbstate_t *ps)); +# endif +_GL_CXXALIAS_SYS (mbrtoc16, size_t, + (char16_t *pc, const char *s, size_t n, mbstate_t *ps)); +# endif +# if __GLIBC__ + (__GLIBC_MINOR__ >= 16) > 2 +_GL_CXXALIASWARN (mbrtoc16); +# endif +#elif defined GNULIB_POSIXCHECK +# undef mbrtoc16 +# if HAVE_RAW_DECL_MBRTOC16 +_GL_WARN_ON_USE (mbrtoc16, "mbrtoc16 is not portable - " + "use gnulib module mbrtoc16 for portability"); +# endif +#endif + + /* Convert a string to a 32-bit wide string. */ #if @GNULIB_MBSNRTOC32S@ # if _GL_WCHAR_T_IS_UCS4 && !defined IN_MBSNRTOC32S diff --git a/m4/mbrtoc16.m4 b/m4/mbrtoc16.m4 new file mode 100644 index 0000000000..ceb8802a57 --- /dev/null +++ b/m4/mbrtoc16.m4 @@ -0,0 +1,413 @@ +# mbrtoc16.m4 serial 1 +dnl Copyright (C) 2014-2023 Free Software Foundation, Inc. +dnl This file is free software; the Free Software Foundation +dnl gives unlimited permission to copy and/or distribute it, +dnl with or without modifications, as long as this notice is preserved. + +AC_DEFUN([gl_FUNC_MBRTOC16], +[ + AC_REQUIRE([gl_UCHAR_H_DEFAULTS]) + + AC_REQUIRE([AC_TYPE_MBSTATE_T]) + dnl Determine REPLACE_MBSTATE_T, from which GNULIB_defined_mbstate_t is + dnl determined. It describes how our overridden mbrtowc is implemented. + dnl We then implement mbrtoc16 accordingly. + AC_REQUIRE([gl_MBSTATE_T_BROKEN]) + + AC_REQUIRE([gl_TYPE_CHAR16_T]) + AC_REQUIRE([gl_MBRTOC16_SANITYCHECK]) + + AC_REQUIRE([gl_CHECK_FUNC_MBRTOC16]) + if test $gl_cv_func_mbrtoc16 = no; then + HAVE_MBRTOC16=0 + else + if test $GNULIBHEADERS_OVERRIDE_CHAR16_T = 1 || test $REPLACE_MBSTATE_T = 1; then + REPLACE_MBRTOC16=1 + else + gl_MBRTOC16_NULL_DESTINATION + gl_MBRTOC16_RETVAL + gl_MBRTOC16_EMPTY_INPUT + gl_MBRTOC16_C_LOCALE + case "$gl_cv_func_mbrtoc16_null_destination" in + *yes) ;; + *) REPLACE_MBRTOC16=1 ;; + esac + case "$gl_cv_func_mbrtoc16_retval" in + *yes) ;; + *) REPLACE_MBRTOC16=1 ;; + esac + case "$gl_cv_func_mbrtoc16_empty_input" in + *yes) ;; + *) REPLACE_MBRTOC16=1 ;; + esac + case "$gl_cv_func_mbrtoc16_C_locale_sans_EILSEQ" in + *yes) ;; + *) REPLACE_MBRTOC16=1 ;; + esac + fi + if test $HAVE_WORKING_MBRTOC16 = 0; then + REPLACE_MBRTOC16=1 + fi + fi +]) + +AC_DEFUN([gl_CHECK_FUNC_MBRTOC16], +[ + dnl Cf. gl_CHECK_FUNCS_ANDROID + AC_CHECK_DECL([mbrtoc16], , , + [[#ifdef __HAIKU__ + #include <stdint.h> + #endif + #include <uchar.h> + ]]) + if test $ac_cv_have_decl_mbrtoc16 = yes; then + AC_CHECK_FUNCS([mbrtoc16]) + gl_cv_func_mbrtoc16="$ac_cv_func_mbrtoc16" + else + gl_cv_func_mbrtoc16=no + fi +]) + +AC_DEFUN([gl_MBRTOC16_NULL_DESTINATION], +[ + AC_REQUIRE([AC_PROG_CC]) + AC_REQUIRE([gl_TYPE_CHAR16_T]) + AC_REQUIRE([gt_LOCALE_FR_UTF8]) + AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles + AC_CACHE_CHECK([whether mbrtoc16 supports a NULL destination], + [gl_cv_func_mbrtoc16_null_destination], + [ + dnl Initial guess, used when cross-compiling or when no suitable locale + dnl is present. +changequote(,)dnl + case "$host_os" in + # Guess no on glibc systems. + *-gnu* | gnu*) gl_cv_func_mbrtoc16_null_destination="guessing no" ;; + # Guess yes otherwise. + *) gl_cv_func_mbrtoc16_null_destination="guessing yes" ;; + esac +changequote([,])dnl + if test $LOCALE_FR_UTF8 != none; then + AC_RUN_IFELSE( + [AC_LANG_SOURCE([[ + #include <locale.h> + #include <string.h> + #ifdef __HAIKU__ + #include <stdint.h> + #endif + #include <uchar.h> + int + main (void) + { + if (setlocale (LC_ALL, "$LOCALE_FR_UTF8") == NULL + return 1; + mbstate_t state; + size_t ret; + char input[] = "\360\237\230\213"; /* U+1F60B */ + memset (&state, '\0', sizeof (mbstate_t)); + ret = mbrtoc16 (NULL, input, 4, &state); + if (ret != 4) + return 2; + ret = mbrtoc16 (NULL, input + 4, 0, &state); + if (ret != (size_t)(-3)) + return 3; + return 0; + }]])], + [gl_cv_func_mbrtoc16_null_destination=yes], + [gl_cv_func_mbrtoc16_null_destination=no], + [:]) + fi + ]) +]) + +dnl Test whether mbrtoc16, when parsing the end of a multibyte character, +dnl correctly returns the number of bytes that were needed to complete the +dnl character (not the total number of bytes of the multibyte character). +dnl Also test whether mbrtoc16 returns a byte count when it has stored a +dnl high surrogate and (size_t) -3 when it has stored a low surrogate. +dnl Result is gl_cv_func_mbrtoc16_retval. + +AC_DEFUN([gl_MBRTOC16_RETVAL], +[ + AC_REQUIRE([AC_PROG_CC]) + AC_REQUIRE([AC_CANONICAL_HOST]) + AC_CACHE_CHECK([whether mbrtoc16 has a correct return value], + [gl_cv_func_mbrtoc16_retval], + [ + dnl Initial guess, used when cross-compiling or when no suitable locale + dnl is present. +changequote(,)dnl + case "$host_os" in + # Guess no on Android. + linux*-android*) gl_cv_func_mbrtoc16_retval="guessing no" ;; + # Guess no on native Windows. + mingw*) gl_cv_func_mbrtoc16_retval="guessing no" ;; + # Guess yes otherwise. + *) gl_cv_func_mbrtoc16_retval="guessing yes" ;; + esac +changequote([,])dnl + AC_RUN_IFELSE( + [AC_LANG_SOURCE([[ +#include <locale.h> +#include <string.h> +#include <wchar.h> +#ifdef __HAIKU__ + #include <stdint.h> +#endif +#include <uchar.h> +int main () +{ + int result = 0; + int found_some_locale = 0; + /* This fails on Android. */ + if (setlocale (LC_ALL, "en_US.UTF-8") != NULL) + { + char input[] = "\360\237\230\213"; /* U+1F60B */ + mbstate_t state; + char16_t wc; + + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc16 (&wc, input, 4, &state) != 4 + || mbrtoc16 (&wc, input + 4, 0, &state) != (size_t)(-3)) + result |= 1; + found_some_locale = 1; + } + /* This fails on native Windows. */ + if (setlocale (LC_ALL, "Japanese_Japan.932") != NULL) + { + char input[] = "<\223\372\226\173\214\352>"; /* "<日本語>" */ + mbstate_t state; + char16_t wc; + + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc16 (&wc, input + 3, 1, &state) == (size_t)(-2)) + { + input[3] = '\0'; + if (mbrtoc16 (&wc, input + 4, 4, &state) != 1) + result |= 2; + } + found_some_locale = 1; + } + if (setlocale (LC_ALL, "Chinese_Taiwan.950") != NULL) + { + char input[] = "<\244\351\245\273\273\171>"; /* "<日本語>" */ + mbstate_t state; + char16_t wc; + + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc16 (&wc, input + 3, 1, &state) == (size_t)(-2)) + { + input[3] = '\0'; + if (mbrtoc16 (&wc, input + 4, 4, &state) != 1) + result |= 4; + } + found_some_locale = 1; + } + if (setlocale (LC_ALL, "Chinese_China.936") != NULL) + { + char input[] = "<\310\325\261\276\325\132>"; /* "<日本語>" */ + mbstate_t state; + char16_t wc; + + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc16 (&wc, input + 3, 1, &state) == (size_t)(-2)) + { + input[3] = '\0'; + if (mbrtoc16 (&wc, input + 4, 4, &state) != 1) + result |= 8; + } + found_some_locale = 1; + } + return (found_some_locale ? result : 77); +}]])], + [gl_cv_func_mbrtoc16_retval=yes], + [if test $? != 77; then + gl_cv_func_mbrtoc16_retval=no + fi + ], + [:]) + ]) +]) + +dnl Test whether mbrtoc16 returns the correct value on empty input. + +AC_DEFUN([gl_MBRTOC16_EMPTY_INPUT], +[ + AC_REQUIRE([AC_PROG_CC]) + AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles + AC_CACHE_CHECK([whether mbrtoc16 works on empty input], + [gl_cv_func_mbrtoc16_empty_input], + [ + AC_RUN_IFELSE( + [AC_LANG_SOURCE([[ + #ifdef __HAIKU__ + #include <stdint.h> + #endif + #include <uchar.h> + static char16_t wc; + static mbstate_t mbs; + int + main (void) + { + return mbrtoc16 (&wc, "", 0, &mbs) != (size_t) -2; + }]])], + [gl_cv_func_mbrtoc16_empty_input=yes], + [gl_cv_func_mbrtoc16_empty_input=no], + [case "$host_os" in + # Guess no on glibc systems. + *-gnu* | gnu*) gl_cv_func_mbrtoc16_empty_input="guessing no" ;; + # Guess no on Android. + linux*-android*) gl_cv_func_mbrtoc16_empty_input="guessing no" ;; + *) gl_cv_func_mbrtoc16_empty_input="guessing yes" ;; + esac + ]) + ]) +]) + +dnl <https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbrtowc.html> +dnl POSIX:2018 says regarding mbrtowc: "In the POSIX locale an [EILSEQ] error +dnl cannot occur since all byte values are valid characters." It is reasonable +dnl to expect mbrtoc16 to behave in the same way. + +AC_DEFUN([gl_MBRTOC16_C_LOCALE], +[ + AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles + AC_CACHE_CHECK([whether the C locale is free of encoding errors], + [gl_cv_func_mbrtoc16_C_locale_sans_EILSEQ], + [AC_RUN_IFELSE( + [AC_LANG_PROGRAM( + [[#include <limits.h> + #include <locale.h> + #ifdef __HAIKU__ + #include <stdint.h> + #endif + #include <uchar.h> + ]], [[ + int i; + char *locale = setlocale (LC_ALL, "C"); + if (! locale) + return 2; + for (i = CHAR_MIN; i <= CHAR_MAX; i++) + { + char c = i; + char16_t wc; + mbstate_t mbs = { 0, }; + size_t ss = mbrtoc16 (&wc, &c, 1, &mbs); + if (1 < ss) + return 3; + } + return 0; + ]])], + [gl_cv_func_mbrtoc16_C_locale_sans_EILSEQ=yes], + [gl_cv_func_mbrtoc16_C_locale_sans_EILSEQ=no], + [case "$host_os" in + # Guess yes on native Windows. + mingw*) gl_cv_func_mbrtoc16_C_locale_sans_EILSEQ="guessing yes" ;; + *) gl_cv_func_mbrtoc16_C_locale_sans_EILSEQ="$gl_cross_guess_normal" ;; + esac + ]) + ]) +]) + +dnl Test whether mbrtoc16 works not worse than mbrtowc. +dnl Result is HAVE_WORKING_MBRTOC16. + +AC_DEFUN([gl_MBRTOC16_SANITYCHECK], +[ + AC_REQUIRE([AC_PROG_CC]) + AC_REQUIRE([gl_TYPE_CHAR16_T]) + AC_REQUIRE([gl_CHECK_FUNC_MBRTOC16]) + AC_REQUIRE([gt_LOCALE_FR]) + AC_REQUIRE([gt_LOCALE_ZH_CN]) + AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles + if test $GNULIBHEADERS_OVERRIDE_CHAR16_T = 1 || test $gl_cv_func_mbrtoc16 = no; then + HAVE_WORKING_MBRTOC16=0 + else + AC_CACHE_CHECK([whether mbrtoc16 works as well as mbrtowc], + [gl_cv_func_mbrtoc16_sanitycheck], + [ + dnl Initial guess, used when cross-compiling or when no suitable locale + dnl is present. +changequote(,)dnl + case "$host_os" in + # Guess no on FreeBSD, Solaris, native Windows. + freebsd* | midnightbsd* | solaris* | mingw*) + gl_cv_func_mbrtoc16_sanitycheck="guessing no" + ;; + # Guess yes otherwise. + *) + gl_cv_func_mbrtoc16_sanitycheck="guessing yes" + ;; + esac +changequote([,])dnl + if test $LOCALE_FR != none || test $LOCALE_ZH_CN != none; then + AC_RUN_IFELSE( + [AC_LANG_SOURCE([[ +#include <locale.h> +#include <stdlib.h> +#include <string.h> +#include <wchar.h> +#ifdef __HAIKU__ + #include <stdint.h> +#endif +#include <uchar.h> +int main () +{ + int result = 0; + /* This fails on MSVC: + mbrtoc16 returns (size_t)-1. + mbrtowc returns 1 (correct). */ + if (strcmp ("$LOCALE_FR", "none") != 0 + && setlocale (LC_ALL, "$LOCALE_FR") != NULL) + { + mbstate_t state; + wchar_t wc = (wchar_t) 0xBADF; + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtowc (&wc, "\374", 1, &state) == 1) + { + char16_t c16 = (wchar_t) 0xBADF; + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc16 (&c16, "\374", 1, &state) != 1) + result |= 1; + } + } + /* This fails on FreeBSD 13.2 and Solaris 11.4: + mbrtoc16 returns (size_t)-2 or (size_t)-1. + mbrtowc returns 4 (correct). */ + if (strcmp ("$LOCALE_ZH_CN", "none") != 0 + && setlocale (LC_ALL, "$LOCALE_ZH_CN") != NULL) + { + mbstate_t state; + wchar_t wc = (wchar_t) 0xBADF; + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtowc (&wc, "\224\071\375\067", 4, &state) == 4) + { + char16_t c16 = (wchar_t) 0xBADF; + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc16 (&c16, "\224\071\375\067", 4, &state) != 4) + result |= 2; + } + } + return result; +}]])], + [gl_cv_func_mbrtoc16_sanitycheck=yes], + [gl_cv_func_mbrtoc16_sanitycheck=no], + [:]) + fi + ]) + case "$gl_cv_func_mbrtoc16_sanitycheck" in + *yes) + HAVE_WORKING_MBRTOC16=1 + AC_DEFINE([HAVE_WORKING_MBRTOC16], [1], + [Define if the mbrtoc16 function basically works.]) + ;; + *) HAVE_WORKING_MBRTOC16=0 ;; + esac + fi + AC_SUBST([HAVE_WORKING_MBRTOC16]) +]) + +# Prerequisites of lib/mbrtoc16.c. +AC_DEFUN([gl_PREREQ_MBRTOC16], [ + : +]) diff --git a/m4/uchar_h.m4 b/m4/uchar_h.m4 index 0a48eb20c4..1b784d2aef 100644 --- a/m4/uchar_h.m4 +++ b/m4/uchar_h.m4 @@ -1,4 +1,4 @@ -# uchar_h.m4 serial 26 +# uchar_h.m4 serial 27 dnl Copyright (C) 2019-2023 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, @@ -99,7 +99,7 @@ AC_DEFUN_ONCE([gl_UCHAR_H] #include <stdint.h> #endif #include <uchar.h> - ]], [c32rtomb mbrtoc32]) + ]], [c32rtomb mbrtoc16 mbrtoc32]) ]) AC_DEFUN_ONCE([gl_TYPE_CHAR8_T], @@ -223,6 +223,7 @@ AC_DEFUN([gl_UCHAR_H_REQUIRE_DEFAULTS] gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_C32STOMBS]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_C32SWIDTH]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_C32TOB]) + gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MBRTOC16]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MBRTOC32]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MBSNRTOC32S]) gl_MODULE_INDICATOR_INIT_VARIABLE([GNULIB_MBSRTOC32S]) @@ -236,7 +237,9 @@ AC_DEFUN([gl_UCHAR_H_DEFAULTS] [ dnl Assume proper GNU behavior unless another module says otherwise. HAVE_C32RTOMB=1; AC_SUBST([HAVE_C32RTOMB]) + HAVE_MBRTOC16=1; AC_SUBST([HAVE_MBRTOC16]) HAVE_MBRTOC32=1; AC_SUBST([HAVE_MBRTOC32]) REPLACE_C32RTOMB=0; AC_SUBST([REPLACE_C32RTOMB]) + REPLACE_MBRTOC16=0; AC_SUBST([REPLACE_MBRTOC16]) REPLACE_MBRTOC32=0; AC_SUBST([REPLACE_MBRTOC32]) ]) diff --git a/modules/mbrtoc16 b/modules/mbrtoc16 new file mode 100644 index 0000000000..960432cdbe --- /dev/null +++ b/modules/mbrtoc16 @@ -0,0 +1,46 @@ +Description: +mbrtoc16() function: convert multibyte character and return next 16-bit wide character. + +Files: +lib/mbrtoc16.c +m4/mbrtoc16.m4 +m4/locale-fr.m4 +m4/codeset.m4 +m4/mbrtowc.m4 +m4/mbstate_t.m4 + +Depends-on: +uchar +uchar-c23 [test $HAVE_MBRTOC16 = 0 || test $REPLACE_MBRTOC16 = 1] +mbrtoc32 [test $HAVE_MBRTOC16 = 0 || test $REPLACE_MBRTOC16 = 1] +mbsinit [test $HAVE_MBRTOC16 = 0 || test $REPLACE_MBRTOC16 = 1] +c99 [test $HAVE_MBRTOC16 = 0 || test $REPLACE_MBRTOC16 = 1] +assert-h [test $HAVE_MBRTOC16 = 0 || test $REPLACE_MBRTOC16 = 1] + +configure.ac: +gl_FUNC_MBRTOC16 +gl_CONDITIONAL([GL_COND_OBJ_MBRTOC16], + [test $HAVE_MBRTOC16 = 0 || test $REPLACE_MBRTOC16 = 1]) +AM_COND_IF([GL_COND_OBJ_MBRTOC16], [ + gl_PREREQ_MBRTOC16 +]) +gl_UCHAR_MODULE_INDICATOR([mbrtoc16]) + +Makefile.am: +if GL_COND_OBJ_MBRTOC16 +lib_SOURCES += mbrtoc16.c +endif + +Include: +<uchar.h> + +Link: +$(LTLIBUNISTRING) when linking with libtool, $(LIBUNISTRING) otherwise +$(MBRTOWC_LIB) +$(LTLIBC32CONV) when linking with libtool, $(LIBC32CONV) otherwise + +License: +LGPLv2+ + +Maintainer: +Bruno Haible diff --git a/modules/uchar b/modules/uchar index 948bcd7993..f5539abbb3 100644 --- a/modules/uchar +++ b/modules/uchar @@ -60,13 +60,16 @@ uchar.h: uchar.in.h $(top_builddir)/config.status $(CXXDEFS_H) -e 's/@''GNULIB_C32STOMBS''@/$(GNULIB_C32STOMBS)/g' \ -e 's/@''GNULIB_C32SWIDTH''@/$(GNULIB_C32SWIDTH)/g' \ -e 's/@''GNULIB_C32TOB''@/$(GNULIB_C32TOB)/g' \ + -e 's/@''GNULIB_MBRTOC16''@/$(GNULIB_MBRTOC16)/g' \ -e 's/@''GNULIB_MBRTOC32''@/$(GNULIB_MBRTOC32)/g' \ -e 's/@''GNULIB_MBSNRTOC32S''@/$(GNULIB_MBSNRTOC32S)/g' \ -e 's/@''GNULIB_MBSRTOC32S''@/$(GNULIB_MBSRTOC32S)/g' \ -e 's/@''GNULIB_MBSTOC32S''@/$(GNULIB_MBSTOC32S)/g' \ -e 's|@''HAVE_C32RTOMB''@|$(HAVE_C32RTOMB)|g' \ + -e 's|@''HAVE_MBRTOC16''@|$(HAVE_MBRTOC16)|g' \ -e 's|@''HAVE_MBRTOC32''@|$(HAVE_MBRTOC32)|g' \ -e 's|@''REPLACE_C32RTOMB''@|$(REPLACE_C32RTOMB)|g' \ + -e 's|@''REPLACE_MBRTOC16''@|$(REPLACE_MBRTOC16)|g' \ -e 's|@''REPLACE_MBRTOC32''@|$(REPLACE_MBRTOC32)|g' \ -e '/definitions of _GL_FUNCDECL_RPL/r $(CXXDEFS_H)' \ $(srcdir)/uchar.in.h > $@-t -- 2.34.1
From 5fc18b14848b7d7a5f561e8424dc4fadaa0e8d9d Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Wed, 28 Jun 2023 18:36:02 +0200 Subject: [PATCH 2/2] mbrtoc16: Add tests. * tests/test-mbrtoc16.c: New file, based on tests/test-mbrtoc32.c. * tests/test-mbrtoc16-1.sh: New file, based on tests/test-mbrtoc32-1.sh. * tests/test-mbrtoc16-2.sh: New file, based on tests/test-mbrtoc32-2.sh. * tests/test-mbrtoc16-3.sh: New file, based on tests/test-mbrtoc32-3.sh. * tests/test-mbrtoc16-4.sh: New file, based on tests/test-mbrtoc32-4.sh. * tests/test-mbrtoc16-5.sh: New file, based on tests/test-mbrtoc32-5.sh. * tests/test-mbrtoc16-w32.c: New file, based on tests/test-mbrtoc32-w32.c. * tests/test-mbrtoc16-w32-1.sh: New file, based on tests/test-mbrtoc32-w32-1.sh. * tests/test-mbrtoc16-w32-2.sh: New file, based on tests/test-mbrtoc32-w32-2.sh. * tests/test-mbrtoc16-w32-3.sh: New file, based on tests/test-mbrtoc32-w32-3.sh. * tests/test-mbrtoc16-w32-4.sh: New file, based on tests/test-mbrtoc32-w32-4.sh. * tests/test-mbrtoc16-w32-5.sh: New file, based on tests/test-mbrtoc32-w32-5.sh. * tests/test-mbrtoc16-w32-6.sh: New file, based on tests/test-mbrtoc32-w32-6.sh. * tests/test-mbrtoc16-w32-7.sh: New file, based on tests/test-mbrtoc32-w32-7.sh. * modules/mbrtoc16-tests: New file, based on modules/mbrtoc32-tests. --- ChangeLog | 25 ++ modules/mbrtoc16-tests | 49 +++ tests/test-mbrtoc16-1.sh | 15 + tests/test-mbrtoc16-2.sh | 15 + tests/test-mbrtoc16-3.sh | 15 + tests/test-mbrtoc16-4.sh | 15 + tests/test-mbrtoc16-5.sh | 9 + tests/test-mbrtoc16-w32-1.sh | 4 + tests/test-mbrtoc16-w32-2.sh | 4 + tests/test-mbrtoc16-w32-3.sh | 4 + tests/test-mbrtoc16-w32-4.sh | 4 + tests/test-mbrtoc16-w32-5.sh | 4 + tests/test-mbrtoc16-w32-6.sh | 4 + tests/test-mbrtoc16-w32-7.sh | 4 + tests/test-mbrtoc16-w32.c | 774 +++++++++++++++++++++++++++++++++++ tests/test-mbrtoc16.c | 445 ++++++++++++++++++++ 16 files changed, 1390 insertions(+) create mode 100644 modules/mbrtoc16-tests create mode 100755 tests/test-mbrtoc16-1.sh create mode 100755 tests/test-mbrtoc16-2.sh create mode 100755 tests/test-mbrtoc16-3.sh create mode 100755 tests/test-mbrtoc16-4.sh create mode 100755 tests/test-mbrtoc16-5.sh create mode 100755 tests/test-mbrtoc16-w32-1.sh create mode 100755 tests/test-mbrtoc16-w32-2.sh create mode 100755 tests/test-mbrtoc16-w32-3.sh create mode 100755 tests/test-mbrtoc16-w32-4.sh create mode 100755 tests/test-mbrtoc16-w32-5.sh create mode 100755 tests/test-mbrtoc16-w32-6.sh create mode 100755 tests/test-mbrtoc16-w32-7.sh create mode 100644 tests/test-mbrtoc16-w32.c create mode 100644 tests/test-mbrtoc16.c diff --git a/ChangeLog b/ChangeLog index 75ebb6d401..319be18c40 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,30 @@ 2023-06-28 Bruno Haible <br...@clisp.org> + mbrtoc16: Add tests. + * tests/test-mbrtoc16.c: New file, based on tests/test-mbrtoc32.c. + * tests/test-mbrtoc16-1.sh: New file, based on tests/test-mbrtoc32-1.sh. + * tests/test-mbrtoc16-2.sh: New file, based on tests/test-mbrtoc32-2.sh. + * tests/test-mbrtoc16-3.sh: New file, based on tests/test-mbrtoc32-3.sh. + * tests/test-mbrtoc16-4.sh: New file, based on tests/test-mbrtoc32-4.sh. + * tests/test-mbrtoc16-5.sh: New file, based on tests/test-mbrtoc32-5.sh. + * tests/test-mbrtoc16-w32.c: New file, based on + tests/test-mbrtoc32-w32.c. + * tests/test-mbrtoc16-w32-1.sh: New file, based on + tests/test-mbrtoc32-w32-1.sh. + * tests/test-mbrtoc16-w32-2.sh: New file, based on + tests/test-mbrtoc32-w32-2.sh. + * tests/test-mbrtoc16-w32-3.sh: New file, based on + tests/test-mbrtoc32-w32-3.sh. + * tests/test-mbrtoc16-w32-4.sh: New file, based on + tests/test-mbrtoc32-w32-4.sh. + * tests/test-mbrtoc16-w32-5.sh: New file, based on + tests/test-mbrtoc32-w32-5.sh. + * tests/test-mbrtoc16-w32-6.sh: New file, based on + tests/test-mbrtoc32-w32-6.sh. + * tests/test-mbrtoc16-w32-7.sh: New file, based on + tests/test-mbrtoc32-w32-7.sh. + * modules/mbrtoc16-tests: New file, based on modules/mbrtoc32-tests. + mbrtoc16: New module. * lib/uchar.in.h (mbrtoc16): New declaration. * lib/mbrtoc16.c: New file. diff --git a/modules/mbrtoc16-tests b/modules/mbrtoc16-tests new file mode 100644 index 0000000000..2ac7e0f0ee --- /dev/null +++ b/modules/mbrtoc16-tests @@ -0,0 +1,49 @@ +Files: +tests/test-mbrtoc16-1.sh +tests/test-mbrtoc16-2.sh +tests/test-mbrtoc16-3.sh +tests/test-mbrtoc16-4.sh +tests/test-mbrtoc16-5.sh +tests/test-mbrtoc16.c +tests/test-mbrtoc16-w32-1.sh +tests/test-mbrtoc16-w32-2.sh +tests/test-mbrtoc16-w32-3.sh +tests/test-mbrtoc16-w32-4.sh +tests/test-mbrtoc16-w32-5.sh +tests/test-mbrtoc16-w32-6.sh +tests/test-mbrtoc16-w32-7.sh +tests/test-mbrtoc16-w32.c +tests/signature.h +tests/macros.h +m4/locale-fr.m4 +m4/locale-ja.m4 +m4/locale-zh.m4 +m4/codeset.m4 + +Depends-on: +mbsinit +btoc32 +c32tob +setlocale +localcharset + +configure.ac: +gt_LOCALE_FR +gt_LOCALE_FR_UTF8 +gt_LOCALE_JA +gt_LOCALE_ZH_CN + +Makefile.am: +TESTS += \ + test-mbrtoc16-1.sh test-mbrtoc16-2.sh test-mbrtoc16-3.sh test-mbrtoc16-4.sh \ + test-mbrtoc16-5.sh \ + test-mbrtoc16-w32-1.sh test-mbrtoc16-w32-2.sh test-mbrtoc16-w32-3.sh \ + test-mbrtoc16-w32-4.sh test-mbrtoc16-w32-5.sh test-mbrtoc16-w32-6.sh \ + test-mbrtoc16-w32-7.sh +TESTS_ENVIRONMENT += \ + LOCALE_FR='@LOCALE_FR@' \ + LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \ + LOCALE_JA='@LOCALE_JA@' \ + LOCALE_ZH_CN='@LOCALE_ZH_CN@' +check_PROGRAMS += test-mbrtoc16 test-mbrtoc16-w32 +test_mbrtoc16_LDADD = $(LDADD) $(LIBUNISTRING) $(SETLOCALE_LIB) $(MBRTOWC_LIB) $(LIBC32CONV) diff --git a/tests/test-mbrtoc16-1.sh b/tests/test-mbrtoc16-1.sh new file mode 100755 index 0000000000..811a57093c --- /dev/null +++ b/tests/test-mbrtoc16-1.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test in an ISO-8859-1 or ISO-8859-15 locale. +: "${LOCALE_FR=fr_FR}" +if test $LOCALE_FR = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no traditional french locale is installed" + else + echo "Skipping test: no traditional french locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_FR \ +${CHECKER} ./test-mbrtoc16${EXEEXT} 1 diff --git a/tests/test-mbrtoc16-2.sh b/tests/test-mbrtoc16-2.sh new file mode 100755 index 0000000000..aa2e9d84ba --- /dev/null +++ b/tests/test-mbrtoc16-2.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific UTF-8 locale is installed. +: "${LOCALE_FR_UTF8=fr_FR.UTF-8}" +if test $LOCALE_FR_UTF8 = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no french Unicode locale is installed" + else + echo "Skipping test: no french Unicode locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_FR_UTF8 \ +${CHECKER} ./test-mbrtoc16${EXEEXT} 2 diff --git a/tests/test-mbrtoc16-3.sh b/tests/test-mbrtoc16-3.sh new file mode 100755 index 0000000000..b58bb4aab2 --- /dev/null +++ b/tests/test-mbrtoc16-3.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific EUC-JP locale is installed. +: "${LOCALE_JA=ja_JP}" +if test $LOCALE_JA = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no traditional japanese locale is installed" + else + echo "Skipping test: no traditional japanese locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_JA \ +${CHECKER} ./test-mbrtoc16${EXEEXT} 3 diff --git a/tests/test-mbrtoc16-4.sh b/tests/test-mbrtoc16-4.sh new file mode 100755 index 0000000000..8fa66efe21 --- /dev/null +++ b/tests/test-mbrtoc16-4.sh @@ -0,0 +1,15 @@ +#!/bin/sh + +# Test whether a specific GB18030 locale is installed. +: "${LOCALE_ZH_CN=zh_CN.GB18030}" +if test $LOCALE_ZH_CN = none; then + if test -f /usr/bin/localedef; then + echo "Skipping test: no transitional chinese locale is installed" + else + echo "Skipping test: no transitional chinese locale is supported" + fi + exit 77 +fi + +LC_ALL=$LOCALE_ZH_CN \ +${CHECKER} ./test-mbrtoc16${EXEEXT} 4 diff --git a/tests/test-mbrtoc16-5.sh b/tests/test-mbrtoc16-5.sh new file mode 100755 index 0000000000..71affee23d --- /dev/null +++ b/tests/test-mbrtoc16-5.sh @@ -0,0 +1,9 @@ +#!/bin/sh + +# Test whether the POSIX locale has encoding errors. +LC_ALL=C \ +${CHECKER} ./test-mbrtoc16${EXEEXT} 5 || exit 1 +LC_ALL=POSIX \ +${CHECKER} ./test-mbrtoc16${EXEEXT} 5 || exit 1 + +exit 0 diff --git a/tests/test-mbrtoc16-w32-1.sh b/tests/test-mbrtoc16-w32-1.sh new file mode 100755 index 0000000000..db19c943ac --- /dev/null +++ b/tests/test-mbrtoc16-w32-1.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP1252 locale. +${CHECKER} ./test-mbrtoc16-w32${EXEEXT} French_France 1252 diff --git a/tests/test-mbrtoc16-w32-2.sh b/tests/test-mbrtoc16-w32-2.sh new file mode 100755 index 0000000000..05d4cef17f --- /dev/null +++ b/tests/test-mbrtoc16-w32-2.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP1256 locale. +${CHECKER} ./test-mbrtoc16-w32${EXEEXT} "Arabic_Saudi Arabia" 1256 diff --git a/tests/test-mbrtoc16-w32-3.sh b/tests/test-mbrtoc16-w32-3.sh new file mode 100755 index 0000000000..f00830011c --- /dev/null +++ b/tests/test-mbrtoc16-w32-3.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP932 locale. +${CHECKER} ./test-mbrtoc16-w32${EXEEXT} Japanese_Japan 932 diff --git a/tests/test-mbrtoc16-w32-4.sh b/tests/test-mbrtoc16-w32-4.sh new file mode 100755 index 0000000000..f4cdff6838 --- /dev/null +++ b/tests/test-mbrtoc16-w32-4.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP950 locale. +${CHECKER} ./test-mbrtoc16-w32${EXEEXT} Chinese_Taiwan 950 diff --git a/tests/test-mbrtoc16-w32-5.sh b/tests/test-mbrtoc16-w32-5.sh new file mode 100755 index 0000000000..f7685a4533 --- /dev/null +++ b/tests/test-mbrtoc16-w32-5.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a CP936 locale. +${CHECKER} ./test-mbrtoc16-w32${EXEEXT} Chinese_China 936 diff --git a/tests/test-mbrtoc16-w32-6.sh b/tests/test-mbrtoc16-w32-6.sh new file mode 100755 index 0000000000..a695af94f9 --- /dev/null +++ b/tests/test-mbrtoc16-w32-6.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test a GB18030 locale. +${CHECKER} ./test-mbrtoc16-w32${EXEEXT} Chinese_China 54936 diff --git a/tests/test-mbrtoc16-w32-7.sh b/tests/test-mbrtoc16-w32-7.sh new file mode 100755 index 0000000000..44ddd7e303 --- /dev/null +++ b/tests/test-mbrtoc16-w32-7.sh @@ -0,0 +1,4 @@ +#!/bin/sh + +# Test some UTF-8 locales. +${CHECKER} ./test-mbrtoc16-w32${EXEEXT} French_France Japanese_Japan Chinese_Taiwan Chinese_China 65001 diff --git a/tests/test-mbrtoc16-w32.c b/tests/test-mbrtoc16-w32.c new file mode 100644 index 0000000000..c7aef287d5 --- /dev/null +++ b/tests/test-mbrtoc16-w32.c @@ -0,0 +1,774 @@ +/* Test of conversion of multibyte character to 16-bit wide characters. + Copyright (C) 2008-2023 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +#include <config.h> + +#include <uchar.h> + +#include <errno.h> +#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <wchar.h> + +#include "localcharset.h" +#include "macros.h" + +#if defined _WIN32 && !defined __CYGWIN__ + +static int +test_one_locale (const char *name, int codepage) +{ + mbstate_t state; + char16_t wc; + size_t ret; + +# if 1 + /* Portable code to set the locale. */ + { + char name_with_codepage[1024]; + + sprintf (name_with_codepage, "%s.%d", name, codepage); + + /* Set the locale. */ + if (setlocale (LC_ALL, name_with_codepage) == NULL) + return 77; + } +# else + /* Hacky way to set a locale.codepage combination that setlocale() refuses + to set. */ + { + /* Codepage of the current locale, set with setlocale(). + Not necessarily the same as GetACP(). */ + extern __declspec(dllimport) unsigned int __lc_codepage; + + /* Set the locale. */ + if (setlocale (LC_ALL, name) == NULL) + return 77; + + /* Clobber the codepage and MB_CUR_MAX, both set by setlocale(). */ + __lc_codepage = codepage; + switch (codepage) + { + case 1252: + case 1256: + MB_CUR_MAX = 1; + break; + case 932: + case 950: + case 936: + MB_CUR_MAX = 2; + break; + case 54936: + case 65001: + MB_CUR_MAX = 4; + break; + } + + /* Test whether the codepage is really available. */ + memset (&state, '\0', sizeof (mbstate_t)); + if (mbrtoc16 (&wc, " ", 1, &state) == (size_t)(-1)) + return 77; + } +# endif + + /* Test zero-length input. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "x", 0, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (mbsinit (&state)); + } + + /* Test NUL byte input. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "", 1, &state); + ASSERT (ret == 0); + ASSERT (wc == 0); + ASSERT (mbsinit (&state)); + ret = mbrtoc16 (NULL, "", 1, &state); + ASSERT (ret == 0); + ASSERT (mbsinit (&state)); + } + + /* Test single-byte input. */ + { + int c; + char buf[1]; + + memset (&state, '\0', sizeof (mbstate_t)); + for (c = 0; c < 0x100; c++) + switch (c) + { + case '\t': case '\v': case '\f': + case ' ': case '!': case '"': case '#': case '%': + case '&': case '\'': case '(': case ')': case '*': + case '+': case ',': case '-': case '.': case '/': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + case ':': case ';': case '<': case '=': case '>': + case '?': + case 'A': case 'B': case 'C': case 'D': case 'E': + case 'F': case 'G': case 'H': case 'I': case 'J': + case 'K': case 'L': case 'M': case 'N': case 'O': + case 'P': case 'Q': case 'R': case 'S': case 'T': + case 'U': case 'V': case 'W': case 'X': case 'Y': + case 'Z': + case '[': case '\\': case ']': case '^': case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': + case 'f': case 'g': case 'h': case 'i': case 'j': + case 'k': case 'l': case 'm': case 'n': case 'o': + case 'p': case 'q': case 'r': case 's': case 't': + case 'u': case 'v': case 'w': case 'x': case 'y': + case 'z': case '{': case '|': case '}': case '~': + /* c is in the ISO C "basic character set". */ + buf[0] = c; + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, buf, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == c); + ASSERT (mbsinit (&state)); + ret = mbrtoc16 (NULL, buf, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + break; + } + } + + /* Test special calling convention, passing a NULL pointer. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, NULL, 5, &state); + ASSERT (ret == 0); + ASSERT (wc == (char16_t) 0xBADF); + ASSERT (mbsinit (&state)); + } + + switch (codepage) + { + case 1252: + /* Locale encoding is CP1252, an extension of ISO-8859-1. */ + { + char input[] = "B\374\337er"; /* "Büßer" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 'B'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 1, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == (unsigned char) '\374'); + ASSERT (wc == 0x00FC); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == (unsigned char) '\337'); + ASSERT (wc == 0x00DF); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 2, &state); + ASSERT (ret == 1); + ASSERT (wc == 'e'); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 4, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 'r'); + ASSERT (mbsinit (&state)); + } + return 0; + + case 1256: + /* Locale encoding is CP1256, not the same as ISO-8859-6. */ + { + char input[] = "x\302\341\346y"; /* "xآلوy" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 'x'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 1, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == (unsigned char) '\302'); + ASSERT (wc == 0x0622); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == (unsigned char) '\341'); + ASSERT (wc == 0x0644); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 2, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == (unsigned char) '\346'); + ASSERT (wc == 0x0648); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 4, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 'y'); + ASSERT (mbsinit (&state)); + } + return 0; + + case 932: + /* Locale encoding is CP932, similar to Shift_JIS. */ + { + char input[] = "<\223\372\226\173\214\352>"; /* "<日本語>" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '<'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 2, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x65E5); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + input[2] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (char16_t) 0xBADF); + #if 0 + ASSERT (!mbsinit (&state)); + #endif + input[3] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 4, 4, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x672C); + ASSERT (mbsinit (&state)); + input[4] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x8A9E); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + input[6] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 7, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '>'); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\377", 1, &state); /* 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\225\377", 2, &state); /* 0x95 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || (ret == 2 && wc == 0x30FB)); + } + return 0; + + case 950: + /* Locale encoding is CP950, similar to Big5. */ + { + char input[] = "<\244\351\245\273\273\171>"; /* "<日本語>" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '<'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 2, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x65E5); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + input[2] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (char16_t) 0xBADF); + #if 0 + ASSERT (!mbsinit (&state)); + #endif + input[3] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 4, 4, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x672C); + ASSERT (mbsinit (&state)); + input[4] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x8A9E); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + input[6] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 7, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '>'); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\377", 1, &state); /* 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\225\377", 2, &state); /* 0x95 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || (ret == 2 && wc == '?')); + } + return 0; + + case 936: + /* Locale encoding is CP936 = GBK, an extension of GB2312. */ + { + char input[] = "<\310\325\261\276\325\132>"; /* "<日本語>" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '<'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 2, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x65E5); + ASSERT (mbsinit (&state)); + input[1] = '\0'; + input[2] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (char16_t) 0xBADF); + #if 0 + ASSERT (!mbsinit (&state)); + #endif + input[3] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 4, 4, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x672C); + ASSERT (mbsinit (&state)); + input[4] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x8A9E); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + input[6] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 7, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '>'); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\377", 1, &state); /* 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\225\377", 2, &state); /* 0x95 0xFF */ + ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || (ret == 2 && wc == '?')); + } + return 0; + + case 54936: + /* Locale encoding is CP54936 = GB18030. */ + if (strcmp (locale_charset (), "GB18030") != 0) + return 77; + { + char input[] = "s\250\271\201\060\211\070\224\071\375\067!"; /* "süß😋!" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 's'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (char16_t) 0xBADF); + #if 0 + ASSERT (!mbsinit (&state)); + #endif + input[1] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 2, 9, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x00FC); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 3, 8, &state); + ASSERT (ret == 4); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 8, &state); + ASSERT (ret == 4); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x00DF); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + input[5] = '\0'; + input[6] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 7, 4, &state); + ASSERT (ret == 4); + ret = mbrtoc16 (NULL, input + 11, 0, &state); + ASSERT (ret == (size_t)(-3)); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 7, 4, &state); + ASSERT (ret == 4); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0xD83D); /* expect UTF-16 encoding */ + input[7] = '\0'; + input[8] = '\0'; + input[9] = '\0'; + input[10] = '\0'; + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 11, 0, &state); + ASSERT (ret == (size_t)(-3)); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0xDE0B); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 11, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '!'); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\377", 1, &state); /* 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\225\377", 2, &state); /* 0x95 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\201\045", 2, &state); /* 0x81 0x25 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\201\060\377", 3, &state); /* 0x81 0x30 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\201\060\377\064", 4, &state); /* 0x81 0x30 0xFF 0x34 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\201\060\211\072", 4, &state); /* 0x81 0x30 0x89 0x3A */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + } + return 0; + + case 65001: + /* Locale encoding is CP65001 = UTF-8. */ + if (strcmp (locale_charset (), "UTF-8") != 0) + return 77; + { + char input[] = "s\303\274\303\237\360\237\230\213!"; /* "süß😋!" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 's'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (char16_t) 0xBADF); + #if 0 + ASSERT (!mbsinit (&state)); + #endif + input[1] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 2, 7, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x00FC); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 3, 6, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 6, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x00DF); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 5, 4, &state); + ASSERT (ret == 4); + ret = mbrtoc16 (NULL, input + 9, 0, &state); + ASSERT (ret == (size_t)(-3)); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 5, 4, &state); + ASSERT (ret == 4); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0xD83D); /* expect UTF-16 encoding */ + input[5] = '\0'; + input[6] = '\0'; + input[7] = '\0'; + input[8] = '\0'; + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 9, 0, &state); + ASSERT (ret == (size_t)(-3)); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0xDE0B); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 9, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '!'); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\377", 1, &state); /* 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\303\300", 2, &state); /* 0xC3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\343\300", 2, &state); /* 0xE3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\343\300\200", 3, &state); /* 0xE3 0xC0 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\343\200\300", 3, &state); /* 0xE3 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\363\300", 2, &state); /* 0xF3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\363\300\200\200", 4, &state); /* 0xF3 0xC0 0x80 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\363\200\300", 3, &state); /* 0xF3 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\363\200\300\200", 4, &state); /* 0xF3 0x80 0xC0 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "\363\200\200\300", 4, &state); /* 0xF3 0x80 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + } + return 0; + + default: + return 1; + } +} + +int +main (int argc, char *argv[]) +{ + int codepage = atoi (argv[argc - 1]); + int result; + int i; + + result = 77; + for (i = 1; i < argc - 1; i++) + { + int ret = test_one_locale (argv[i], codepage); + + if (ret != 77) + result = ret; + } + + if (result == 77) + { + fprintf (stderr, "Skipping test: found no locale with codepage %d\n", + codepage); + } + return result; +} + +#else + +int +main (int argc, char *argv[]) +{ + fputs ("Skipping test: not a native Windows system\n", stderr); + return 77; +} + +#endif diff --git a/tests/test-mbrtoc16.c b/tests/test-mbrtoc16.c new file mode 100644 index 0000000000..03732dddb4 --- /dev/null +++ b/tests/test-mbrtoc16.c @@ -0,0 +1,445 @@ +/* Test of conversion of multibyte character to 16-bit wide characters. + Copyright (C) 2008-2023 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2008. */ + +#include <config.h> + +#include <uchar.h> + +#include "signature.h" +SIGNATURE_CHECK (mbrtoc16, size_t, + (char16_t *, const char *, size_t, mbstate_t *)); + +#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <wchar.h> + +#include "macros.h" + +int +main (int argc, char *argv[]) +{ + mbstate_t state; + char16_t wc; + size_t ret; + + /* configure should already have checked that the locale is supported. */ + if (setlocale (LC_ALL, "") == NULL) + return 1; + + /* Test zero-length input. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "x", 0, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (mbsinit (&state)); + } + + /* Test NUL byte input. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, "", 1, &state); + ASSERT (ret == 0); + ASSERT (wc == 0); + ASSERT (mbsinit (&state)); + ret = mbrtoc16 (NULL, "", 1, &state); + ASSERT (ret == 0); + ASSERT (mbsinit (&state)); + } + + /* Test single-byte input. */ + { + int c; + char buf[1]; + + memset (&state, '\0', sizeof (mbstate_t)); + for (c = 0; c < 0x100; c++) + switch (c) + { + case '\t': case '\v': case '\f': + case ' ': case '!': case '"': case '#': case '%': + case '&': case '\'': case '(': case ')': case '*': + case '+': case ',': case '-': case '.': case '/': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + case ':': case ';': case '<': case '=': case '>': + case '?': + case 'A': case 'B': case 'C': case 'D': case 'E': + case 'F': case 'G': case 'H': case 'I': case 'J': + case 'K': case 'L': case 'M': case 'N': case 'O': + case 'P': case 'Q': case 'R': case 'S': case 'T': + case 'U': case 'V': case 'W': case 'X': case 'Y': + case 'Z': + case '[': case '\\': case ']': case '^': case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': + case 'f': case 'g': case 'h': case 'i': case 'j': + case 'k': case 'l': case 'm': case 'n': case 'o': + case 'p': case 'q': case 'r': case 's': case 't': + case 'u': case 'v': case 'w': case 'x': case 'y': + case 'z': case '{': case '|': case '}': case '~': + /* c is in the ISO C "basic character set". */ + ASSERT (c < 0x80); + /* c is an ASCII character. */ + buf[0] = c; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, buf, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == c); + ASSERT (mbsinit (&state)); + + ret = mbrtoc16 (NULL, buf, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + break; + default: + break; + } + } + + /* Test special calling convention, passing a NULL pointer. */ + { + memset (&state, '\0', sizeof (mbstate_t)); + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, NULL, 5, &state); + ASSERT (ret == 0); + ASSERT (wc == (char16_t) 0xBADF); + ASSERT (mbsinit (&state)); + } + +#ifdef __ANDROID__ + /* On Android ≥ 5.0, the default locale is the "C.UTF-8" locale, not the + "C" locale. Furthermore, when you attempt to set the "C" or "POSIX" + locale via setlocale(), what you get is a "C" locale with UTF-8 encoding, + that is, effectively the "C.UTF-8" locale. */ + if (argc > 1 && strcmp (argv[1], "5") == 0 && MB_CUR_MAX > 1) + argv[1] = "2"; +#endif + + if (argc > 1) + switch (argv[1][0]) + { + case '1': + /* Locale encoding is ISO-8859-1 or ISO-8859-15. */ + { + char input[] = "B\374\337er"; /* "Büßer" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 'B'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 1, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == (unsigned char) '\374'); + ASSERT (wc == 0x00FC); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[1] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 2, 3, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == (unsigned char) '\337'); + ASSERT (wc == 0x00DF); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 2, &state); + ASSERT (ret == 1); + ASSERT (wc == 'e'); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 4, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 'r'); + ASSERT (mbsinit (&state)); + } + return 0; + + case '2': + /* Locale encoding is UTF-8. */ + { + char input[] = "s\303\274\303\237\360\237\230\213!"; /* "süß😋!" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 's'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (char16_t) 0xBADF); + #if 0 + ASSERT (!mbsinit (&state)); + #endif + input[1] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 2, 7, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x00FC); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 3, 6, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 6, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x00DF); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 5, 4, &state); + ASSERT (ret == 4); + ret = mbrtoc16 (NULL, input + 9, 0, &state); + ASSERT (ret == (size_t)(-3)); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 5, 4, &state); + ASSERT (ret == 4); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0xD83D); /* expect UTF-16 encoding */ + input[5] = '\0'; + input[6] = '\0'; + input[7] = '\0'; + input[8] = '\0'; + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 9, 0, &state); + ASSERT (ret == (size_t)(-3)); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0xDE0B); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 9, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '!'); + ASSERT (mbsinit (&state)); + } + return 0; + + case '3': + /* Locale encoding is EUC-JP. */ + { + char input[] = "<\306\374\313\334\270\354>"; /* "<日本語>" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '<'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 2, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x65E5); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[1] = '\0'; + input[2] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (char16_t) 0xBADF); + #if 0 + ASSERT (!mbsinit (&state)); + #endif + input[3] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 4, 4, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x672C); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[4] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 5, 3, &state); + ASSERT (ret == 2); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x8A9E); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[5] = '\0'; + input[6] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 7, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '>'); + ASSERT (mbsinit (&state)); + } + return 0; + + case '4': + /* Locale encoding is GB18030. */ + #if GL_CHAR32_T_IS_UNICODE && (defined __NetBSD__ || defined __sun) + fputs ("Skipping test: The GB18030 converter in this system's iconv is broken.\n", stderr); + return 77; + #endif + { + char input[] = "s\250\271\201\060\211\070\224\071\375\067!"; /* "süß😋!" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 's'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (char16_t) 0xBADF); + #if 0 + ASSERT (!mbsinit (&state)); + #endif + input[1] = '\0'; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 2, 9, &state); + ASSERT (ret == 1); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x00FC); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 3, 8, &state); + ASSERT (ret == 4); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 3, 8, &state); + ASSERT (ret == 4); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0x00DF); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + input[5] = '\0'; + input[6] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtoc16 (NULL, input + 7, 4, &state); + ASSERT (ret == 4); + ret = mbrtoc16 (NULL, input + 11, 0, &state); + ASSERT (ret == (size_t)(-3)); + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 7, 4, &state); + ASSERT (ret == 4); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0xD83D); /* expect UTF-16 encoding */ + input[7] = '\0'; + input[8] = '\0'; + input[9] = '\0'; + input[10] = '\0'; + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 11, 0, &state); + ASSERT (ret == (size_t)(-3)); + ASSERT (c32tob (wc) == EOF); + ASSERT (wc == 0xDE0B); /* expect UTF-16 encoding */ + ASSERT (mbsinit (&state)); + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, input + 11, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == '!'); + ASSERT (mbsinit (&state)); + } + return 0; + + case '5': + /* C or POSIX locale. */ + { + int c; + char buf[1]; + + memset (&state, '\0', sizeof (mbstate_t)); + for (c = 0; c < 0x100; c++) + if (c != 0) + { + /* We are testing all nonnull bytes. */ + buf[0] = c; + + wc = (char16_t) 0xBADF; + ret = mbrtoc16 (&wc, buf, 1, &state); + /* POSIX:2018 says regarding mbrtowc: "In the POSIX locale an + [EILSEQ] error cannot occur since all byte values are valid + characters." It is reasonable to expect mbrtoc16 to behave + in the same way. */ + ASSERT (ret == 1); + if (c < 0x80) + /* c is an ASCII character. */ + ASSERT (wc == c); + else + /* On most platforms, the bytes 0x80..0xFF map to U+0080..U+00FF. + But on musl libc, the bytes 0x80..0xFF map to U+DF80..U+DFFF. */ + ASSERT (wc == (btoc32 (c) == 0xDF00 + c ? btoc32 (c) : c)); + ASSERT (mbsinit (&state)); + + ret = mbrtoc16 (NULL, buf, 1, &state); + ASSERT (ret == 1); + ASSERT (mbsinit (&state)); + } + } + return 0; + } + + return 1; +} -- 2.34.1