mbrtoc32 is like mbrtowc, except that it produces a 32-bit wide character.
So, its use will fix the inability, on Windows and 32-bit AIX platforms,
to handle Unicode characters outside the BMP.

The implementation is a bit tricky: For encodings other than UTF-8 and
GB18030, we know that only characters in the BMP occur, therefore the
(possibly overridden) mbrtowc function does what mbrtoc32 needs. In
this case, we must NOT make assumptions about the wide character encoding.
For UTF-8, on the other hand, we can assume that the wide character
encoding is the Unicode code point, thus we can add ad-hoc code to
handle this case.
For GB18030, we would have a problem if we don't know the wide character
encoding. Fortunately this case does not occur.

Tested on glibc, musl libc, macOS, FreeBSD, NetBSD, OpenBSD, AIX, HP-UX,
IRIX, Solaris, Cygwin, mingw, MSVC, Haiku, Minix.


2020-01-03  Bruno Haible  <br...@clisp.org>

        mbrtoc32: Add tests.
        * tests/test-mbrtoc32.c: New file, based on tests/test-mbrtowc.c.
        * tests/test-mbrtoc32-1.sh: New file, based on tests/test-mbrtowc1.sh.
        * tests/test-mbrtoc32-2.sh: New file, based on tests/test-mbrtowc2.sh.
        * tests/test-mbrtoc32-3.sh: New file, based on tests/test-mbrtowc3.sh.
        * tests/test-mbrtoc32-4.sh: New file, based on tests/test-mbrtowc4.sh.
        * tests/test-mbrtoc32-5.sh: New file, based on tests/test-mbrtowc5.sh.
        * tests/test-mbrtoc32-w32.c: New file, based on 
tests/test-mbrtowc-w32.c.
        * tests/test-mbrtoc32-w32-1.sh: New file, based on
        tests/test-mbrtowc-w32-1.sh.
        * tests/test-mbrtoc32-w32-2.sh: New file, based on
        tests/test-mbrtowc-w32-2.sh.
        * tests/test-mbrtoc32-w32-3.sh: New file, based on
        tests/test-mbrtowc-w32-3.sh.
        * tests/test-mbrtoc32-w32-4.sh: New file, based on
        tests/test-mbrtowc-w32-4.sh.
        * tests/test-mbrtoc32-w32-5.sh: New file, based on
        tests/test-mbrtowc-w32-5.sh.
        * tests/test-mbrtoc32-w32-6.sh: New file, based on
        tests/test-mbrtowc-w32-6.sh.
        * tests/test-mbrtoc32-w32-7.sh: New file, based on
        tests/test-mbrtowc-w32-7.sh.
        * modules/mbrtoc32-tests: New file, based on modules/mbrtowc-tests.

        mbrtoc32: New module.
        * lib/uchar.in.h (mbrtoc32): New declaration.
        * lib/mbrtoc32.c: New file, based on lib/mbrtowc.c.
        * m4/mbrtoc32.m4: New file, based on m4/mbrtowc.m4.
        * m4/uchar.m4 (gl_UCHAR_H): Test whether mbrtoc32 is declared.
        (gl_UCHAR_H_DEFAULTS): Initialize GNULIB_MBRTOC32, HAVE_MBRTOC32,
        REPLACE_MBRTOC32.
        * modules/uchar (Makefile.am): Substitute GNULIB_MBRTOC32,
        HAVE_MBRTOC32, REPLACE_MBRTOC32.
        * modules/mbrtoc32: New file, based on modules/mbrtowc.
        * tests/test-uchar-c++.cc (mbrtoc32): Verify the signature.
        * modules/uchar-c++-tests (Makefile.am): Link test-uchar-c++ with
        $(LIB_MBRTOWC).
        * doc/posix-functions/mbrtoc32.texi: Document the new module.
        * doc/posix-functions/mbrtowc.texi: Mention the new module.

2020-01-03  Bruno Haible  <br...@clisp.org>

        mbrtowc: Refactor to share code with mbrtoc32.
        * lib/mbrtowc-impl.h: New file, extracted from lib/mbrtowc.c.
        * lib/mbrtowc-impl-utf8.h: Likewise.
        * lib/mbrtowc.c (mbrtowc): Define macro FITS_IN_CHAR_TYPE. Include
        mbrtowc-impl.h.
        * modules/mbrtowc (Files): Add the new files.

2020-01-03  Bruno Haible  <br...@clisp.org>

        mbrtowc: Refactor locale charset dispatching.
        * lib/lc-charset-dispatch.h: New file, extracted from lib/mbrtowc.c.
        * lib/lc-charset-dispatch.c: New file, extracted from lib/mbrtowc.c.
        * lib/mbrtowc.c: Include lc-charset-dispatch.h. Don't include
        localcharset.h, streq.h.
        (enc_t): Remove type.
        (locale_enc): Remove function.
        (cached_locale_enc): Remove variable.
        (locale_enc_cached): Remove function.
        (mbrtowc): Invoke locale_encoding_classification.
        * m4/mbrtowc.m4 (gl_PREREQ_MBRTOWC): Update comment.
        * modules/mbrtowc (Files): Add lc-charset-dispatch.h,
        lc-charset-dispatch.c.
        (configure.ac): Arrange to compile lc-charset-dispatch.c.

>From 3df90147719110350d9a674cc37e99cbd27a9c3e Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Fri, 3 Jan 2020 22:34:07 +0100
Subject: [PATCH 1/5] mbrtowc: Refactor locale charset dispatching.

* lib/lc-charset-dispatch.h: New file, extracted from lib/mbrtowc.c.
* lib/lc-charset-dispatch.c: New file, extracted from lib/mbrtowc.c.
* lib/mbrtowc.c: Include lc-charset-dispatch.h. Don't include
localcharset.h, streq.h.
(enc_t): Remove type.
(locale_enc): Remove function.
(cached_locale_enc): Remove variable.
(locale_enc_cached): Remove function.
(mbrtowc): Invoke locale_encoding_classification.
* m4/mbrtowc.m4 (gl_PREREQ_MBRTOWC): Update comment.
* modules/mbrtowc (Files): Add lc-charset-dispatch.h,
lc-charset-dispatch.c.
(configure.ac): Arrange to compile lc-charset-dispatch.c.
---
 ChangeLog                 | 17 ++++++++++
 lib/lc-charset-dispatch.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/lc-charset-dispatch.h | 40 +++++++++++++++++++++++
 lib/mbrtowc.c             | 53 ++----------------------------
 m4/mbrtowc.m4             |  2 +-
 modules/mbrtowc           |  3 ++
 6 files changed, 145 insertions(+), 52 deletions(-)
 create mode 100644 lib/lc-charset-dispatch.c
 create mode 100644 lib/lc-charset-dispatch.h

diff --git a/ChangeLog b/ChangeLog
index 6c0d925..930f715 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,20 @@
+2020-01-03  Bruno Haible  <br...@clisp.org>
+
+	mbrtowc: Refactor locale charset dispatching.
+	* lib/lc-charset-dispatch.h: New file, extracted from lib/mbrtowc.c.
+	* lib/lc-charset-dispatch.c: New file, extracted from lib/mbrtowc.c.
+	* lib/mbrtowc.c: Include lc-charset-dispatch.h. Don't include
+	localcharset.h, streq.h.
+	(enc_t): Remove type.
+	(locale_enc): Remove function.
+	(cached_locale_enc): Remove variable.
+	(locale_enc_cached): Remove function.
+	(mbrtowc): Invoke locale_encoding_classification.
+	* m4/mbrtowc.m4 (gl_PREREQ_MBRTOWC): Update comment.
+	* modules/mbrtowc (Files): Add lc-charset-dispatch.h,
+	lc-charset-dispatch.c.
+	(configure.ac): Arrange to compile lc-charset-dispatch.c.
+
 2020-01-03  Paul Eggert  <egg...@cs.ucla.edu>
 
 	doc: mention 32-bit time_t issue
diff --git a/lib/lc-charset-dispatch.c b/lib/lc-charset-dispatch.c
new file mode 100644
index 0000000..79057d4
--- /dev/null
+++ b/lib/lc-charset-dispatch.c
@@ -0,0 +1,82 @@
+/* Dispatching based on the current locale's character encoding.
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2018.  */
+
+#include <config.h>
+
+/* Specification.  */
+#include "lc-charset-dispatch.h"
+
+#if GNULIB_defined_mbstate_t
+
+# include "localcharset.h"
+# include "streq.h"
+
+# if GNULIB_WCHAR_SINGLE
+/* When we know that the locale does not change, provide a speedup by
+   caching the value of locale_encoding_classification.  */
+#  define locale_encoding_classification_cached locale_encoding_classification
+# else
+/* By default, don't make assumptions, hence no caching.  */
+#  define locale_encoding_classification_uncached locale_encoding_classification
+# endif
+
+# if GNULIB_WCHAR_SINGLE
+static inline
+# endif
+enc_t
+locale_encoding_classification_uncached (void)
+{
+  const char *encoding = locale_charset ();
+  if (STREQ_OPT (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0, 0))
+    return enc_utf8;
+  if (STREQ_OPT (encoding, "EUC-JP", 'E', 'U', 'C', '-', 'J', 'P', 0, 0, 0))
+    return enc_eucjp;
+  if (STREQ_OPT (encoding, "EUC-KR", 'E', 'U', 'C', '-', 'K', 'R', 0, 0, 0)
+      || STREQ_OPT (encoding, "GB2312", 'G', 'B', '2', '3', '1', '2', 0, 0, 0)
+      || STREQ_OPT (encoding, "BIG5", 'B', 'I', 'G', '5', 0, 0, 0, 0, 0))
+    return enc_94;
+  if (STREQ_OPT (encoding, "EUC-TW", 'E', 'U', 'C', '-', 'T', 'W', 0, 0, 0))
+    return enc_euctw;
+  if (STREQ_OPT (encoding, "GB18030", 'G', 'B', '1', '8', '0', '3', '0', 0, 0))
+    return enc_gb18030;
+  if (STREQ_OPT (encoding, "SJIS", 'S', 'J', 'I', 'S', 0, 0, 0, 0, 0))
+    return enc_sjis;
+  return enc_other;
+}
+
+# if GNULIB_WCHAR_SINGLE
+
+static int cached_locale_enc = -1;
+
+enc_t
+locale_encoding_classification_cached (void)
+{
+  if (cached_locale_enc < 0)
+    cached_locale_enc = locale_encoding_classification_uncached ();
+  return cached_locale_enc;
+}
+
+# endif
+
+#else
+
+/* This declaration is solely to ensure that after preprocessing
+   this file is never empty.  */
+typedef int dummy;
+
+#endif
diff --git a/lib/lc-charset-dispatch.h b/lib/lc-charset-dispatch.h
new file mode 100644
index 0000000..95c2316
--- /dev/null
+++ b/lib/lc-charset-dispatch.h
@@ -0,0 +1,40 @@
+/* Dispatching based on the current locale's character encoding.
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2018.  */
+
+#include <wchar.h>
+
+#if GNULIB_defined_mbstate_t
+
+/* A classification of special values of the encoding of the current locale.  */
+typedef enum
+  {
+    enc_other,      /* other */
+    enc_utf8,       /* UTF-8 */
+    enc_eucjp,      /* EUC-JP */
+    enc_94,         /* EUC-KR, GB2312, BIG5 */
+    enc_euctw,      /* EUC-TW */
+    enc_gb18030,    /* GB18030 */
+    enc_sjis        /* SJIS */
+  }
+  enc_t;
+
+/* Returns a classification of special values of the encoding of the current
+   locale.  */
+extern enc_t locale_encoding_classification (void);
+
+#endif
diff --git a/lib/mbrtowc.c b/lib/mbrtowc.c
index 066d949..fdef8f9 100644
--- a/lib/mbrtowc.c
+++ b/lib/mbrtowc.c
@@ -54,9 +54,8 @@
 
 # endif
 
-# include "localcharset.h"
-# include "streq.h"
 # include "verify.h"
+# include "lc-charset-dispatch.h"
 # include "mbtowc-lock.h"
 
 # ifndef FALLTHROUGH
@@ -67,54 +66,6 @@
 #  endif
 # endif
 
-/* Returns a classification of special values of the encoding of the current
-   locale.  */
-typedef enum {
-  enc_other,      /* other */
-  enc_utf8,       /* UTF-8 */
-  enc_eucjp,      /* EUC-JP */
-  enc_94,         /* EUC-KR, GB2312, BIG5 */
-  enc_euctw,      /* EUC-TW */
-  enc_gb18030,    /* GB18030 */
-  enc_sjis        /* SJIS */
-} enc_t;
-static inline enc_t
-locale_enc (void)
-{
-  const char *encoding = locale_charset ();
-  if (STREQ_OPT (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0, 0))
-    return enc_utf8;
-  if (STREQ_OPT (encoding, "EUC-JP", 'E', 'U', 'C', '-', 'J', 'P', 0, 0, 0))
-    return enc_eucjp;
-  if (STREQ_OPT (encoding, "EUC-KR", 'E', 'U', 'C', '-', 'K', 'R', 0, 0, 0)
-      || STREQ_OPT (encoding, "GB2312", 'G', 'B', '2', '3', '1', '2', 0, 0, 0)
-      || STREQ_OPT (encoding, "BIG5", 'B', 'I', 'G', '5', 0, 0, 0, 0, 0))
-    return enc_94;
-  if (STREQ_OPT (encoding, "EUC-TW", 'E', 'U', 'C', '-', 'T', 'W', 0, 0, 0))
-    return enc_euctw;
-  if (STREQ_OPT (encoding, "GB18030", 'G', 'B', '1', '8', '0', '3', '0', 0, 0))
-    return enc_gb18030;
-  if (STREQ_OPT (encoding, "SJIS", 'S', 'J', 'I', 'S', 0, 0, 0, 0, 0))
-    return enc_sjis;
-  return enc_other;
-}
-
-# if GNULIB_WCHAR_SINGLE
-/* When we know that the locale does not change, provide a speedup by
-   caching the value of locale_enc.  */
-static int cached_locale_enc = -1;
-static inline enc_t
-locale_enc_cached (void)
-{
-  if (cached_locale_enc < 0)
-    cached_locale_enc = locale_enc ();
-  return cached_locale_enc;
-}
-# else
-/* By default, don't make assumptions, hence no caching.  */
-#  define locale_enc_cached locale_enc
-# endif
-
 verify (sizeof (mbstate_t) >= 4);
 static char internal_state[4];
 
@@ -177,7 +128,7 @@ mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
 
     /* Here m > 0.  */
 
-    enc = locale_enc_cached ();
+    enc = locale_encoding_classification ();
 
     if (enc == enc_utf8) /* UTF-8 */
       {
diff --git a/m4/mbrtowc.m4 b/m4/mbrtowc.m4
index bd9225b..755f8c9 100644
--- a/m4/mbrtowc.m4
+++ b/m4/mbrtowc.m4
@@ -821,7 +821,7 @@ AC_DEFUN([gl_MBRTOWC_C_LOCALE],
     ])
 ])
 
-# Prerequisites of lib/mbrtowc.c.
+# Prerequisites of lib/mbrtowc.c and lib/lc-charset-dispatch.c.
 AC_DEFUN([gl_PREREQ_MBRTOWC], [
   AC_REQUIRE([AC_C_INLINE])
   :
diff --git a/modules/mbrtowc b/modules/mbrtowc
index db10256..22afc96 100644
--- a/modules/mbrtowc
+++ b/modules/mbrtowc
@@ -3,6 +3,8 @@ mbrtowc() function: convert multibyte character to wide character.
 
 Files:
 lib/mbrtowc.c
+lib/lc-charset-dispatch.h
+lib/lc-charset-dispatch.c
 lib/mbtowc-lock.h
 lib/mbtowc-lock.c
 lib/windows-initguard.h
@@ -29,6 +31,7 @@ configure.ac:
 gl_FUNC_MBRTOWC
 if test $HAVE_MBRTOWC = 0 || test $REPLACE_MBRTOWC = 1; then
   AC_LIBOBJ([mbrtowc])
+  AC_LIBOBJ([lc-charset-dispatch])
   AC_LIBOBJ([mbtowc-lock])
   gl_PREREQ_MBRTOWC
   gl_PREREQ_MBTOWC_LOCK
-- 
2.7.4

>From 40f39973fc0399aa45775ca26564948f77678ab5 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Sat, 4 Jan 2020 02:31:39 +0100
Subject: [PATCH 3/5] mbrtowc: Refactor to share code with mbrtoc32.

* lib/mbrtowc-impl.h: New file, extracted from lib/mbrtowc.c.
* lib/mbrtowc-impl-utf8.h: Likewise.
* lib/mbrtowc.c (mbrtowc): Define macro FITS_IN_CHAR_TYPE. Include
mbrtowc-impl.h.
* modules/mbrtowc (Files): Add the new files.
---
 ChangeLog               |   9 ++
 lib/mbrtowc-impl-utf8.h | 138 ++++++++++++++++++
 lib/mbrtowc-impl.h      | 262 ++++++++++++++++++++++++++++++++++
 lib/mbrtowc.c           | 364 ++----------------------------------------------
 modules/mbrtowc         |   2 +
 5 files changed, 420 insertions(+), 355 deletions(-)
 create mode 100644 lib/mbrtowc-impl-utf8.h
 create mode 100644 lib/mbrtowc-impl.h

diff --git a/ChangeLog b/ChangeLog
index c5ae7b7..f8c6793 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2020-01-03  Bruno Haible  <br...@clisp.org>
+
+	mbrtowc: Refactor to share code with mbrtoc32.
+	* lib/mbrtowc-impl.h: New file, extracted from lib/mbrtowc.c.
+	* lib/mbrtowc-impl-utf8.h: Likewise.
+	* lib/mbrtowc.c (mbrtowc): Define macro FITS_IN_CHAR_TYPE. Include
+	mbrtowc-impl.h.
+	* modules/mbrtowc (Files): Add the new files.
+
 2020-01-03  Jim Meyering  <meyer...@fb.com>
 
 	doc: fix time.texi wording
diff --git a/lib/mbrtowc-impl-utf8.h b/lib/mbrtowc-impl-utf8.h
new file mode 100644
index 0000000..a826b1b
--- /dev/null
+++ b/lib/mbrtowc-impl-utf8.h
@@ -0,0 +1,138 @@
+/* Convert multibyte character to wide character.
+   Copyright (C) 1999-2002, 2005-2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2008.  */
+
+/* This file contains the part of the body of the mbrtowc and mbrtoc32 functions
+   that handles the special case of the UTF-8 encoding.  */
+
+        /* Cf. unistr/u8-mbtouc.c.  */
+        unsigned char c = (unsigned char) p[0];
+
+        if (c < 0x80)
+          {
+            if (pwc != NULL)
+              *pwc = c;
+            res = (c == 0 ? 0 : 1);
+            goto success;
+          }
+        if (c >= 0xc2)
+          {
+            if (c < 0xe0)
+              {
+                if (m == 1)
+                  goto incomplete;
+                else /* m >= 2 */
+                  {
+                    unsigned char c2 = (unsigned char) p[1];
+
+                    if ((c2 ^ 0x80) < 0x40)
+                      {
+                        if (pwc != NULL)
+                          *pwc = ((unsigned int) (c & 0x1f) << 6)
+                                 | (unsigned int) (c2 ^ 0x80);
+                        res = 2;
+                        goto success;
+                      }
+                  }
+              }
+            else if (c < 0xf0)
+              {
+                if (m == 1)
+                  goto incomplete;
+                else
+                  {
+                    unsigned char c2 = (unsigned char) p[1];
+
+                    if ((c2 ^ 0x80) < 0x40
+                        && (c >= 0xe1 || c2 >= 0xa0)
+                        && (c != 0xed || c2 < 0xa0))
+                      {
+                        if (m == 2)
+                          goto incomplete;
+                        else /* m >= 3 */
+                          {
+                            unsigned char c3 = (unsigned char) p[2];
+
+                            if ((c3 ^ 0x80) < 0x40)
+                              {
+                                unsigned int wc =
+                                  (((unsigned int) (c & 0x0f) << 12)
+                                   | ((unsigned int) (c2 ^ 0x80) << 6)
+                                   | (unsigned int) (c3 ^ 0x80));
+
+                                if (FITS_IN_CHAR_TYPE (wc))
+                                  {
+                                    if (pwc != NULL)
+                                      *pwc = wc;
+                                    res = 3;
+                                    goto success;
+                                  }
+                              }
+                          }
+                      }
+                  }
+              }
+            else if (c <= 0xf4)
+              {
+                if (m == 1)
+                  goto incomplete;
+                else
+                  {
+                    unsigned char c2 = (unsigned char) p[1];
+
+                    if ((c2 ^ 0x80) < 0x40
+                        && (c >= 0xf1 || c2 >= 0x90)
+                        && (c < 0xf4 || (c == 0xf4 && c2 < 0x90)))
+                      {
+                        if (m == 2)
+                          goto incomplete;
+                        else
+                          {
+                            unsigned char c3 = (unsigned char) p[2];
+
+                            if ((c3 ^ 0x80) < 0x40)
+                              {
+                                if (m == 3)
+                                  goto incomplete;
+                                else /* m >= 4 */
+                                  {
+                                    unsigned char c4 = (unsigned char) p[3];
+
+                                    if ((c4 ^ 0x80) < 0x40)
+                                      {
+                                        unsigned int wc =
+                                          (((unsigned int) (c & 0x07) << 18)
+                                           | ((unsigned int) (c2 ^ 0x80) << 12)
+                                           | ((unsigned int) (c3 ^ 0x80) << 6)
+                                           | (unsigned int) (c4 ^ 0x80));
+
+                                        if (FITS_IN_CHAR_TYPE (wc))
+                                          {
+                                            if (pwc != NULL)
+                                              *pwc = wc;
+                                            res = 4;
+                                            goto success;
+                                          }
+                                      }
+                                  }
+                              }
+                          }
+                      }
+                  }
+              }
+          }
+        goto invalid;
diff --git a/lib/mbrtowc-impl.h b/lib/mbrtowc-impl.h
new file mode 100644
index 0000000..c970439
--- /dev/null
+++ b/lib/mbrtowc-impl.h
@@ -0,0 +1,262 @@
+/* Convert multibyte character to wide character.
+   Copyright (C) 1999-2002, 2005-2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2008.  */
+
+/* This file contains the body of the mbrtowc and mbrtoc32 functions,
+   when GNULIB_defined_mbstate_t is defined.  */
+
+  char *pstate = (char *)ps;
+
+  if (s == NULL)
+    {
+      pwc = NULL;
+      s = "";
+      n = 1;
+    }
+
+  if (n == 0)
+    return (size_t)(-2);
+
+  /* Here n > 0.  */
+
+  if (pstate == NULL)
+    pstate = internal_state;
+
+  {
+    size_t nstate = pstate[0];
+    char buf[4];
+    const char *p;
+    size_t m;
+    enc_t enc;
+    int res;
+
+    switch (nstate)
+      {
+      case 0:
+        p = s;
+        m = n;
+        break;
+      case 3:
+        buf[2] = pstate[3];
+        FALLTHROUGH;
+      case 2:
+        buf[1] = pstate[2];
+        FALLTHROUGH;
+      case 1:
+        buf[0] = pstate[1];
+        p = buf;
+        m = nstate;
+        buf[m++] = s[0];
+        if (n >= 2 && m < 4)
+          {
+            buf[m++] = s[1];
+            if (n >= 3 && m < 4)
+              buf[m++] = s[2];
+          }
+        break;
+      default:
+        errno = EINVAL;
+        return (size_t)(-1);
+      }
+
+    /* Here m > 0.  */
+
+    enc = locale_encoding_classification ();
+
+    if (enc == enc_utf8) /* UTF-8 */
+      {
+        /* Achieve
+             - multi-thread safety and
+             - the ability to produce wide character values > WCHAR_MAX
+           by not calling mbtowc() at all.  */
+#include "mbrtowc-impl-utf8.h"
+      }
+    else
+      {
+        /* The hidden internal state of mbtowc would make this function not
+           multi-thread safe.  Achieve multi-thread safety through a lock.  */
+        wchar_t wc;
+        res = mbtowc_with_lock (&wc, p, m);
+
+        if (res >= 0)
+          {
+            if ((wc == 0) != (res == 0))
+              abort ();
+            if (pwc != NULL)
+              *pwc = wc;
+            goto success;
+          }
+
+        /* mbtowc does not distinguish between invalid and incomplete multibyte
+           sequences.  But mbrtowc needs to make this distinction.
+           There are two possible approaches:
+             - Use iconv() and its return value.
+             - Use built-in knowledge about the possible encodings.
+           Given the low quality of implementation of iconv() on the systems
+           that lack mbrtowc(), we use the second approach.
+           The possible encodings are:
+             - 8-bit encodings,
+             - EUC-JP, EUC-KR, GB2312, EUC-TW, BIG5, GB18030, SJIS,
+             - UTF-8 (already handled above).
+           Use specialized code for each.  */
+        if (m >= 4 || m >= MB_CUR_MAX)
+          goto invalid;
+        /* Here MB_CUR_MAX > 1 and 0 < m < 4.  */
+        switch (enc)
+          {
+          /* As a reference for this code, you can use the GNU libiconv
+             implementation.  Look for uses of the RET_TOOFEW macro.  */
+
+          case enc_eucjp: /* EUC-JP */
+            {
+              if (m == 1)
+                {
+                  unsigned char c = (unsigned char) p[0];
+
+                  if ((c >= 0xa1 && c < 0xff) || c == 0x8e || c == 0x8f)
+                    goto incomplete;
+                }
+              if (m == 2)
+                {
+                  unsigned char c = (unsigned char) p[0];
+
+                  if (c == 0x8f)
+                    {
+                      unsigned char c2 = (unsigned char) p[1];
+
+                      if (c2 >= 0xa1 && c2 < 0xff)
+                        goto incomplete;
+                    }
+                }
+              goto invalid;
+            }
+
+          case enc_94: /* EUC-KR, GB2312, BIG5 */
+            {
+              if (m == 1)
+                {
+                  unsigned char c = (unsigned char) p[0];
+
+                  if (c >= 0xa1 && c < 0xff)
+                    goto incomplete;
+                }
+              goto invalid;
+            }
+
+          case enc_euctw: /* EUC-TW */
+            {
+              if (m == 1)
+                {
+                  unsigned char c = (unsigned char) p[0];
+
+                  if ((c >= 0xa1 && c < 0xff) || c == 0x8e)
+                    goto incomplete;
+                }
+              else /* m == 2 || m == 3 */
+                {
+                  unsigned char c = (unsigned char) p[0];
+
+                  if (c == 0x8e)
+                    goto incomplete;
+                }
+              goto invalid;
+            }
+
+          case enc_gb18030: /* GB18030 */
+            {
+              if (m == 1)
+                {
+                  unsigned char c = (unsigned char) p[0];
+
+                  if ((c >= 0x90 && c <= 0xe3) || (c >= 0xf8 && c <= 0xfe))
+                    goto incomplete;
+                }
+              else /* m == 2 || m == 3 */
+                {
+                  unsigned char c = (unsigned char) p[0];
+
+                  if (c >= 0x90 && c <= 0xe3)
+                    {
+                      unsigned char c2 = (unsigned char) p[1];
+
+                      if (c2 >= 0x30 && c2 <= 0x39)
+                        {
+                          if (m == 2)
+                            goto incomplete;
+                          else /* m == 3 */
+                            {
+                              unsigned char c3 = (unsigned char) p[2];
+
+                              if (c3 >= 0x81 && c3 <= 0xfe)
+                                goto incomplete;
+                            }
+                        }
+                    }
+                }
+              goto invalid;
+            }
+
+          case enc_sjis: /* SJIS */
+            {
+              if (m == 1)
+                {
+                  unsigned char c = (unsigned char) p[0];
+
+                  if ((c >= 0x81 && c <= 0x9f) || (c >= 0xe0 && c <= 0xea)
+                      || (c >= 0xf0 && c <= 0xf9))
+                    goto incomplete;
+                }
+              goto invalid;
+            }
+
+          default:
+            /* An unknown multibyte encoding.  */
+            goto incomplete;
+          }
+      }
+
+   success:
+    /* res >= 0 is the corrected return value of
+       mbtowc_with_lock (&wc, p, m).  */
+    if (nstate >= (res > 0 ? res : 1))
+      abort ();
+    res -= nstate;
+    pstate[0] = 0;
+    return res;
+
+   incomplete:
+    {
+      size_t k = nstate;
+      /* Here 0 <= k < m < 4.  */
+      pstate[++k] = s[0];
+      if (k < m)
+        {
+          pstate[++k] = s[1];
+          if (k < m)
+            pstate[++k] = s[2];
+        }
+      if (k != m)
+        abort ();
+    }
+    pstate[0] = m;
+    return (size_t)(-2);
+
+   invalid:
+    errno = EILSEQ;
+    /* The conversion state is undefined, says POSIX.  */
+    return (size_t)(-1);
+  }
diff --git a/lib/mbrtowc.c b/lib/mbrtowc.c
index fdef8f9..6cb5267 100644
--- a/lib/mbrtowc.c
+++ b/lib/mbrtowc.c
@@ -20,13 +20,9 @@
 /* Specification.  */
 #include <wchar.h>
 
-#if MBRTOWC_IN_C_LOCALE_MAYBE_EILSEQ
-# include "hard-locale.h"
-# include <locale.h>
-#endif
-
 #if GNULIB_defined_mbstate_t
-/* Implement mbrtowc() on top of mbtowc().  */
+/* Implement mbrtowc() on top of mbtowc() for the non-UTF-8 locales
+   and directly for the UTF-8 locales.  */
 
 # include <errno.h>
 # include <stdint.h>
@@ -72,360 +68,18 @@ static char internal_state[4];
 size_t
 mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
 {
-  char *pstate = (char *)ps;
-
-  if (s == NULL)
-    {
-      pwc = NULL;
-      s = "";
-      n = 1;
-    }
-
-  if (n == 0)
-    return (size_t)(-2);
-
-  /* Here n > 0.  */
-
-  if (pstate == NULL)
-    pstate = internal_state;
-
-  {
-    size_t nstate = pstate[0];
-    char buf[4];
-    const char *p;
-    size_t m;
-    enc_t enc;
-    int res;
-
-    switch (nstate)
-      {
-      case 0:
-        p = s;
-        m = n;
-        break;
-      case 3:
-        buf[2] = pstate[3];
-        FALLTHROUGH;
-      case 2:
-        buf[1] = pstate[2];
-        FALLTHROUGH;
-      case 1:
-        buf[0] = pstate[1];
-        p = buf;
-        m = nstate;
-        buf[m++] = s[0];
-        if (n >= 2 && m < 4)
-          {
-            buf[m++] = s[1];
-            if (n >= 3 && m < 4)
-              buf[m++] = s[2];
-          }
-        break;
-      default:
-        errno = EINVAL;
-        return (size_t)(-1);
-      }
-
-    /* Here m > 0.  */
-
-    enc = locale_encoding_classification ();
-
-    if (enc == enc_utf8) /* UTF-8 */
-      {
-        /* Achieve multi-thread safety by not calling mbtowc() at all.  */
-        /* Cf. unistr/u8-mbtouc.c.  */
-        unsigned char c = (unsigned char) p[0];
-
-        if (c < 0x80)
-          {
-            if (pwc != NULL)
-              *pwc = c;
-            res = (c == 0 ? 0 : 1);
-            goto success;
-          }
-        if (c >= 0xc2)
-          {
-            if (c < 0xe0)
-              {
-                if (m == 1)
-                  goto incomplete;
-                else /* m >= 2 */
-                  {
-                    unsigned char c2 = (unsigned char) p[1];
-
-                    if ((c2 ^ 0x80) < 0x40)
-                      {
-                        if (pwc != NULL)
-                          *pwc = ((unsigned int) (c & 0x1f) << 6)
-                                 | (unsigned int) (c2 ^ 0x80);
-                        res = 2;
-                        goto success;
-                      }
-                  }
-              }
-            else if (c < 0xf0)
-              {
-                if (m == 1)
-                  goto incomplete;
-                else
-                  {
-                    unsigned char c2 = (unsigned char) p[1];
-
-                    if ((c2 ^ 0x80) < 0x40
-                        && (c >= 0xe1 || c2 >= 0xa0)
-                        && (c != 0xed || c2 < 0xa0))
-                      {
-                        if (m == 2)
-                          goto incomplete;
-                        else /* m >= 3 */
-                          {
-                            unsigned char c3 = (unsigned char) p[2];
-
-                            if ((c3 ^ 0x80) < 0x40)
-                              {
-                                unsigned int wc
-                                  = (((unsigned int) (c & 0x0f) << 12)
-                                     | ((unsigned int) (c2 ^ 0x80) << 6)
-                                     | (unsigned int) (c3 ^ 0x80));
-                                if (wc <= WCHAR_MAX)
-                                  {
-                                    if (pwc != NULL)
-                                      *pwc = wc;
-                                    res = 3;
-                                    goto success;
-                                  }
-                              }
-                          }
-                      }
-                  }
-              }
-            else if (c <= 0xf4)
-              {
-                if (m == 1)
-                  goto incomplete;
-                else
-                  {
-                    unsigned char c2 = (unsigned char) p[1];
-
-                    if ((c2 ^ 0x80) < 0x40
-                        && (c >= 0xf1 || c2 >= 0x90)
-                        && (c < 0xf4 || (c == 0xf4 && c2 < 0x90)))
-                      {
-                        if (m == 2)
-                          goto incomplete;
-                        else
-                          {
-                            unsigned char c3 = (unsigned char) p[2];
-
-                            if ((c3 ^ 0x80) < 0x40)
-                              {
-                                if (m == 3)
-                                  goto incomplete;
-                                else /* m >= 4 */
-                                  {
-                                    unsigned char c4 = (unsigned char) p[3];
-
-                                    if ((c4 ^ 0x80) < 0x40)
-                                      {
-                                        unsigned int wc
-                                          = (((unsigned int) (c & 0x07) << 18)
-                                             | ((unsigned int) (c2 ^ 0x80)
-                                                << 12)
-                                             | ((unsigned int) (c3 ^ 0x80) << 6)
-                                             | (unsigned int) (c4 ^ 0x80));
-                                        if (wc <= WCHAR_MAX)
-                                          {
-                                            if (pwc != NULL)
-                                              *pwc = wc;
-                                            res = 4;
-                                            goto success;
-                                          }
-                                      }
-                                  }
-                              }
-                          }
-                      }
-                  }
-              }
-          }
-        goto invalid;
-      }
-    else
-      {
-        /* The hidden internal state of mbtowc would make this function not
-           multi-thread safe.  Achieve multi-thread safety through a lock.  */
-        res = mbtowc_with_lock (pwc, p, m);
-
-        if (res >= 0)
-          {
-            if (pwc != NULL && ((*pwc == 0) != (res == 0)))
-              abort ();
-            goto success;
-          }
-
-        /* mbtowc does not distinguish between invalid and incomplete multibyte
-           sequences.  But mbrtowc needs to make this distinction.
-           There are two possible approaches:
-             - Use iconv() and its return value.
-             - Use built-in knowledge about the possible encodings.
-           Given the low quality of implementation of iconv() on the systems
-           that lack mbrtowc(), we use the second approach.
-           The possible encodings are:
-             - 8-bit encodings,
-             - EUC-JP, EUC-KR, GB2312, EUC-TW, BIG5, GB18030, SJIS,
-             - UTF-8 (already handled above).
-           Use specialized code for each.  */
-        if (m >= 4 || m >= MB_CUR_MAX)
-          goto invalid;
-        /* Here MB_CUR_MAX > 1 and 0 < m < 4.  */
-        switch (enc)
-          {
-          /* As a reference for this code, you can use the GNU libiconv
-             implementation.  Look for uses of the RET_TOOFEW macro.  */
-
-          case enc_eucjp: /* EUC-JP */
-            {
-              if (m == 1)
-                {
-                  unsigned char c = (unsigned char) p[0];
-
-                  if ((c >= 0xa1 && c < 0xff) || c == 0x8e || c == 0x8f)
-                    goto incomplete;
-                }
-              if (m == 2)
-                {
-                  unsigned char c = (unsigned char) p[0];
-
-                  if (c == 0x8f)
-                    {
-                      unsigned char c2 = (unsigned char) p[1];
-
-                      if (c2 >= 0xa1 && c2 < 0xff)
-                        goto incomplete;
-                    }
-                }
-              goto invalid;
-            }
-
-          case enc_94: /* EUC-KR, GB2312, BIG5 */
-            {
-              if (m == 1)
-                {
-                  unsigned char c = (unsigned char) p[0];
-
-                  if (c >= 0xa1 && c < 0xff)
-                    goto incomplete;
-                }
-              goto invalid;
-            }
-
-          case enc_euctw: /* EUC-TW */
-            {
-              if (m == 1)
-                {
-                  unsigned char c = (unsigned char) p[0];
-
-                  if ((c >= 0xa1 && c < 0xff) || c == 0x8e)
-                    goto incomplete;
-                }
-              else /* m == 2 || m == 3 */
-                {
-                  unsigned char c = (unsigned char) p[0];
-
-                  if (c == 0x8e)
-                    goto incomplete;
-                }
-              goto invalid;
-            }
-
-          case enc_gb18030: /* GB18030 */
-            {
-              if (m == 1)
-                {
-                  unsigned char c = (unsigned char) p[0];
-
-                  if ((c >= 0x90 && c <= 0xe3) || (c >= 0xf8 && c <= 0xfe))
-                    goto incomplete;
-                }
-              else /* m == 2 || m == 3 */
-                {
-                  unsigned char c = (unsigned char) p[0];
-
-                  if (c >= 0x90 && c <= 0xe3)
-                    {
-                      unsigned char c2 = (unsigned char) p[1];
-
-                      if (c2 >= 0x30 && c2 <= 0x39)
-                        {
-                          if (m == 2)
-                            goto incomplete;
-                          else /* m == 3 */
-                            {
-                              unsigned char c3 = (unsigned char) p[2];
-
-                              if (c3 >= 0x81 && c3 <= 0xfe)
-                                goto incomplete;
-                            }
-                        }
-                    }
-                }
-              goto invalid;
-            }
-
-          case enc_sjis: /* SJIS */
-            {
-              if (m == 1)
-                {
-                  unsigned char c = (unsigned char) p[0];
-
-                  if ((c >= 0x81 && c <= 0x9f) || (c >= 0xe0 && c <= 0xea)
-                      || (c >= 0xf0 && c <= 0xf9))
-                    goto incomplete;
-                }
-              goto invalid;
-            }
-
-          default:
-            /* An unknown multibyte encoding.  */
-            goto incomplete;
-          }
-      }
-
-   success:
-    /* res >= 0 is the corrected return value of mbtowc (pwc, p, m).  */
-    if (nstate >= (res > 0 ? res : 1))
-      abort ();
-    res -= nstate;
-    pstate[0] = 0;
-    return res;
-
-   incomplete:
-    {
-      size_t k = nstate;
-      /* Here 0 <= k < m < 4.  */
-      pstate[++k] = s[0];
-      if (k < m)
-        {
-          pstate[++k] = s[1];
-          if (k < m)
-            pstate[++k] = s[2];
-        }
-      if (k != m)
-        abort ();
-    }
-    pstate[0] = m;
-    return (size_t)(-2);
-
-   invalid:
-    errno = EILSEQ;
-    /* The conversion state is undefined, says POSIX.  */
-    return (size_t)(-1);
-  }
+# define FITS_IN_CHAR_TYPE(wc)  ((wc) <= WCHAR_MAX)
+# include "mbrtowc-impl.h"
 }
 
 #else
 /* Override the system's mbrtowc() function.  */
 
+# if MBRTOWC_IN_C_LOCALE_MAYBE_EILSEQ
+#  include "hard-locale.h"
+#  include <locale.h>
+# endif
+
 # undef mbrtowc
 
 size_t
diff --git a/modules/mbrtowc b/modules/mbrtowc
index 22afc96..ee2e649 100644
--- a/modules/mbrtowc
+++ b/modules/mbrtowc
@@ -3,6 +3,8 @@ mbrtowc() function: convert multibyte character to wide character.
 
 Files:
 lib/mbrtowc.c
+lib/mbrtowc-impl.h
+lib/mbrtowc-impl-utf8.h
 lib/lc-charset-dispatch.h
 lib/lc-charset-dispatch.c
 lib/mbtowc-lock.h
-- 
2.7.4

>From 20ff668e934fa7e3bb3ce27027cd56ebe8716188 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Sat, 4 Jan 2020 02:32:52 +0100
Subject: [PATCH 4/5] mbrtoc32: New module.

* lib/uchar.in.h (mbrtoc32): New declaration.
* lib/mbrtoc32.c: New file, based on lib/mbrtowc.c.
* m4/mbrtoc32.m4: New file, based on m4/mbrtowc.m4.
* m4/uchar.m4 (gl_UCHAR_H): Test whether mbrtoc32 is declared.
(gl_UCHAR_H_DEFAULTS): Initialize GNULIB_MBRTOC32, HAVE_MBRTOC32,
REPLACE_MBRTOC32.
* modules/uchar (Makefile.am): Substitute GNULIB_MBRTOC32,
HAVE_MBRTOC32, REPLACE_MBRTOC32.
* modules/mbrtoc32: New file, based on modules/mbrtowc.
* tests/test-uchar-c++.cc (mbrtoc32): Verify the signature.
* modules/uchar-c++-tests (Makefile.am): Link test-uchar-c++ with
$(LIB_MBRTOWC).
* doc/posix-functions/mbrtoc32.texi: Document the new module.
* doc/posix-functions/mbrtowc.texi: Mention the new module.
---
 ChangeLog                         |  18 +++
 doc/posix-functions/mbrtoc32.texi |  16 ++-
 doc/posix-functions/mbrtowc.texi  |   7 +-
 lib/mbrtoc32.c                    | 227 ++++++++++++++++++++++++++++++++++++++
 lib/uchar.in.h                    |  29 +++++
 m4/mbrtoc32.m4                    | 117 ++++++++++++++++++++
 m4/uchar.m4                       |  12 +-
 modules/mbrtoc32                  |  51 +++++++++
 modules/uchar                     |   3 +
 modules/uchar-c++-tests           |   1 +
 tests/test-uchar-c++.cc           |   5 +
 11 files changed, 479 insertions(+), 7 deletions(-)
 create mode 100644 lib/mbrtoc32.c
 create mode 100644 m4/mbrtoc32.m4
 create mode 100644 modules/mbrtoc32

diff --git a/ChangeLog b/ChangeLog
index f8c6793..c4d0968 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,23 @@
 2020-01-03  Bruno Haible  <br...@clisp.org>
 
+	mbrtoc32: New module.
+	* lib/uchar.in.h (mbrtoc32): New declaration.
+	* lib/mbrtoc32.c: New file, based on lib/mbrtowc.c.
+	* m4/mbrtoc32.m4: New file, based on m4/mbrtowc.m4.
+	* m4/uchar.m4 (gl_UCHAR_H): Test whether mbrtoc32 is declared.
+	(gl_UCHAR_H_DEFAULTS): Initialize GNULIB_MBRTOC32, HAVE_MBRTOC32,
+	REPLACE_MBRTOC32.
+	* modules/uchar (Makefile.am): Substitute GNULIB_MBRTOC32,
+	HAVE_MBRTOC32, REPLACE_MBRTOC32.
+	* modules/mbrtoc32: New file, based on modules/mbrtowc.
+	* tests/test-uchar-c++.cc (mbrtoc32): Verify the signature.
+	* modules/uchar-c++-tests (Makefile.am): Link test-uchar-c++ with
+	$(LIB_MBRTOWC).
+	* doc/posix-functions/mbrtoc32.texi: Document the new module.
+	* doc/posix-functions/mbrtowc.texi: Mention the new module.
+
+2020-01-03  Bruno Haible  <br...@clisp.org>
+
 	mbrtowc: Refactor to share code with mbrtoc32.
 	* lib/mbrtowc-impl.h: New file, extracted from lib/mbrtowc.c.
 	* lib/mbrtowc-impl-utf8.h: Likewise.
diff --git a/doc/posix-functions/mbrtoc32.texi b/doc/posix-functions/mbrtoc32.texi
index 92241c9..1aa15a3 100644
--- a/doc/posix-functions/mbrtoc32.texi
+++ b/doc/posix-functions/mbrtoc32.texi
@@ -2,15 +2,23 @@
 @section @code{mbrtoc32}
 @findex mbrtoc32
 
-Gnulib module: ---
+Gnulib module: mbrtoc32
 
 Portability problems fixed by Gnulib:
 @itemize
+@item
+This function is missing on most non-glibc platforms:
+glibc 2.15, Mac OS X 10.5, FreeBSD 6.4, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.3, Cygwin, mingw, MSVC 9, Android 4.4.
+@item
+In the C or POSIX locales, this function can return @code{(size_t) -1}
+and set @code{errno} to @code{EILSEQ}:
+glibc 2.23.
+@item
+This function returns 0 instead of @code{(size_t) -2} when the input
+is empty:
+glibc 2.19.
 @end itemize
 
 Portability problems not fixed by Gnulib:
 @itemize
-@item
-This function is missing on most non-glibc platforms:
-glibc 2.15, Mac OS X 10.5, FreeBSD 6.4, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 7.1, HP-UX 11.31, IRIX 6.5, Solaris 11.3, Cygwin, mingw, MSVC 9, Android 4.4.
 @end itemize
diff --git a/doc/posix-functions/mbrtowc.texi b/doc/posix-functions/mbrtowc.texi
index 3b7aed0..897e4da 100644
--- a/doc/posix-functions/mbrtowc.texi
+++ b/doc/posix-functions/mbrtowc.texi
@@ -44,6 +44,9 @@ Solaris 9.
 Portability problems not fixed by Gnulib:
 @itemize
 @item
-On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot
-accommodate all Unicode characters.
+On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and
+therefore cannot accommodate all Unicode characters.
+However, the ISO C11 function @code{mbrtoc32}, provided by Gnulib module
+@code{mbrtoc32}, operates on 32-bit wide characters and therefore does not have
+this limitation.
 @end itemize
diff --git a/lib/mbrtoc32.c b/lib/mbrtoc32.c
new file mode 100644
index 0000000..f2cf71e
--- /dev/null
+++ b/lib/mbrtoc32.c
@@ -0,0 +1,227 @@
+/* Convert multibyte character to 32-bit wide character.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2020.  */
+
+#include <config.h>
+
+/* Specification.  */
+#include <uchar.h>
+
+#include <errno.h>
+#include <stdlib.h>
+
+# ifndef FALLTHROUGH
+#  if __GNUC__ < 7
+#   define FALLTHROUGH ((void) 0)
+#  else
+#   define FALLTHROUGH __attribute__ ((__fallthrough__))
+#  endif
+# endif
+
+#if GNULIB_defined_mbstate_t /* AIX, IRIX */
+/* Implement mbrtoc32() on top of mbtowc() for the non-UTF-8 locales
+   and directly for the UTF-8 locales.  */
+
+# if defined _WIN32 && !defined __CYGWIN__
+
+#  define WIN32_LEAN_AND_MEAN  /* avoid including junk */
+#  include <windows.h>
+
+# elif HAVE_PTHREAD_API
+
+#  include <pthread.h>
+#  if HAVE_THREADS_H && HAVE_WEAK_SYMBOLS
+#   include <threads.h>
+#   pragma weak thrd_exit
+#   define c11_threads_in_use() (thrd_exit != NULL)
+#  else
+#   define c11_threads_in_use() 0
+#  endif
+
+# elif HAVE_THREADS_H
+
+#  include <threads.h>
+
+# endif
+
+# include "verify.h"
+# include "lc-charset-dispatch.h"
+# include "mbtowc-lock.h"
+
+verify (sizeof (mbstate_t) >= 4);
+static char internal_state[4];
+
+size_t
+mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps)
+{
+# define FITS_IN_CHAR_TYPE(wc)  1
+# include "mbrtowc-impl.h"
+}
+
+#else /* glibc, macOS, FreeBSD, NetBSD, OpenBSD, HP-UX, Solaris, Cygwin, mingw, MSVC, Minix, Android */
+
+/* Implement mbrtoc32() based on mbrtowc().  */
+
+# include <wchar.h>
+
+# include "localcharset.h"
+# include "streq.h"
+
+static mbstate_t internal_state;
+
+size_t
+mbrtoc32 (char32_t *pwc, const char *s, size_t n, mbstate_t *ps)
+{
+  /* It's simpler to handle the case s == NULL upfront, than to worry about
+     this case later, before every test of pwc and n.  */
+  if (s == NULL)
+    {
+      pwc = NULL;
+      s = "";
+      n = 1;
+    }
+
+# if MBRTOC32_EMPTY_INPUT_BUG || _GL_LARGE_CHAR32_T
+  if (n == 0)
+    return (size_t) -2;
+# endif
+
+  if (ps == NULL)
+    ps = &internal_state;
+
+# if _GL_LARGE_CHAR32_T
+
+  /* Special-case all encodings that may produce wide character values
+     > WCHAR_MAX.  */
+  const char *encoding = locale_charset ();
+  if (STREQ_OPT (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0, 0))
+    {
+      /* Special-case the UTF-8 encoding.  Assume that the wide-character
+         encoding in a UTF-8 locale is UCS-2 or, equivalently, UTF-16.  */
+      /* Here n > 0.  */
+      char *pstate = (char *)ps;
+      size_t nstate = pstate[0];
+      char buf[4];
+      const char *p;
+      size_t m;
+      int res;
+
+      switch (nstate)
+        {
+        case 0:
+          p = s;
+          m = n;
+          break;
+        case 3:
+          buf[2] = pstate[3];
+          FALLTHROUGH;
+        case 2:
+          buf[1] = pstate[2];
+          FALLTHROUGH;
+        case 1:
+          buf[0] = pstate[1];
+          p = buf;
+          m = nstate;
+          buf[m++] = s[0];
+          if (n >= 2 && m < 4)
+            {
+              buf[m++] = s[1];
+              if (n >= 3 && m < 4)
+                buf[m++] = s[2];
+            }
+          break;
+        default:
+          errno = EINVAL;
+          return (size_t)(-1);
+        }
+
+      /* Here m > 0.  */
+
+      {
+#  define FITS_IN_CHAR_TYPE(wc)  1
+#  include "mbrtowc-impl-utf8.h"
+      }
+
+     success:
+      if (nstate >= (res > 0 ? res : 1))
+        abort ();
+      res -= nstate;
+      /* Set *ps to the initial state.  */
+#  if defined _WIN32 && !defined __CYGWIN__
+      /* Native Windows.  */
+      /* MSVC defines 'mbstate_t' as an 8-byte struct; the first 4 bytes matter.
+         On mingw, 'mbstate_t' is sometimes defined as 'int', sometimes defined
+         as an 8-byte struct, of which the first 4 bytes matter.  */
+      *(unsigned int *)pstate = 0;
+#  elif defined __CYGWIN__
+      /* Cygwin defines 'mbstate_t' as an 8-byte struct; the first 4 bytes
+         matter.  */
+      ps->__count = 0;
+#  else
+      pstate[0] = 0;
+#  endif
+      return res;
+
+     incomplete:
+      {
+        size_t k = nstate;
+        /* Here 0 <= k < m < 4.  */
+        pstate[++k] = s[0];
+        if (k < m)
+          {
+            pstate[++k] = s[1];
+            if (k < m)
+              pstate[++k] = s[2];
+          }
+        if (k != m)
+          abort ();
+      }
+      pstate[0] = m;
+      return (size_t)(-2);
+
+     invalid:
+      errno = EILSEQ;
+      /* The conversion state is undefined, says POSIX.  */
+      return (size_t)(-1);
+    }
+  else
+    {
+      wchar_t wc;
+      size_t ret = mbrtowc (&wc, s, n, ps);
+      if (ret < (size_t) -2 && pwc != NULL)
+        *pwc = wc;
+      return ret;
+    }
+
+# else
+
+  /* char32_t and wchar_t are equivalent.
+     Two implementations are possible:
+       - We can call the original mbrtoc32 (if it exists) and handle
+         MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ.
+       - We can call mbrtowc.
+     The latter is simpler.   */
+  wchar_t wc;
+  size_t ret = mbrtowc (&wc, s, n, ps);
+  if (ret < (size_t) -2 && pwc != NULL)
+    *pwc = wc;
+  return ret;
+
+# endif
+}
+
+#endif
diff --git a/lib/uchar.in.h b/lib/uchar.in.h
index 9cba39b..6f533a1 100644
--- a/lib/uchar.in.h
+++ b/lib/uchar.in.h
@@ -70,4 +70,33 @@ _GL_CXXALIASWARN (c32tob);
 #endif
 
 
+/* Converts a multibyte character to a 32-bit wide character.  */
+#if @GNULIB_MBRTOC32@
+# if @REPLACE_MBRTOC32@
+#  if !(defined __cplusplus && defined GNULIB_NAMESPACE)
+#   undef mbrtoc32
+#   define mbrtoc32 rpl_mbrtoc32
+#  endif
+_GL_FUNCDECL_RPL (mbrtoc32, size_t,
+                  (char32_t *pc, const char *s, size_t n, mbstate_t *ps));
+_GL_CXXALIAS_RPL (mbrtoc32, size_t,
+                  (char32_t *pc, const char *s, size_t n, mbstate_t *ps));
+# else
+#  if !@HAVE_MBRTOC32@
+_GL_FUNCDECL_SYS (mbrtoc32, size_t,
+                  (char32_t *pc, const char *s, size_t n, mbstate_t *ps));
+#  endif
+_GL_CXXALIAS_SYS (mbrtoc32, size_t,
+                  (char32_t *pc, const char *s, size_t n, mbstate_t *ps));
+# endif
+_GL_CXXALIASWARN (mbrtoc32);
+#elif defined GNULIB_POSIXCHECK
+# undef mbrtoc32
+# if HAVE_RAW_DECL_MBRTOC32
+_GL_WARN_ON_USE (mbrtoc32, "mbrtoc32 is not portable - "
+                 "use gnulib module mbrtoc32 for portability");
+# endif
+#endif
+
+
 #endif /* _@GUARD_PREFIX@_UCHAR_H */
diff --git a/m4/mbrtoc32.m4 b/m4/mbrtoc32.m4
new file mode 100644
index 0000000..5039fc7
--- /dev/null
+++ b/m4/mbrtoc32.m4
@@ -0,0 +1,117 @@
+# mbrtoc32.m4 serial 1
+dnl Copyright (C) 2014-2020 Free Software Foundation, Inc.
+dnl This file is free software; the Free Software Foundation
+dnl gives unlimited permission to copy and/or distribute it,
+dnl with or without modifications, as long as this notice is preserved.
+
+AC_DEFUN([gl_FUNC_MBRTOC32],
+[
+  AC_REQUIRE([gl_UCHAR_H_DEFAULTS])
+
+  AC_REQUIRE([AC_TYPE_MBSTATE_T])
+  gl_MBSTATE_T_BROKEN
+
+  AC_CHECK_FUNCS_ONCE([mbrtoc32])
+  if test $ac_cv_func_mbrtoc32 = no; then
+    HAVE_MBRTOC32=0
+  else
+    if test $REPLACE_MBSTATE_T = 1; then
+      REPLACE_MBRTOC32=1
+    else
+      gl_MBRTOC32_EMPTY_INPUT
+      gl_MBRTOC32_C_LOCALE
+      case "$gl_cv_func_mbrtoc32_empty_input" in
+        *yes) ;;
+        *) AC_DEFINE([MBRTOC32_EMPTY_INPUT_BUG], [1],
+             [Define if the mbrtoc32 function does not return (size_t) -2 for empty input.])
+           REPLACE_MBRTOC32=1
+           ;;
+      esac
+      case "$gl_cv_func_mbrtoc32_C_locale_sans_EILSEQ" in
+        *yes) ;;
+        *) AC_DEFINE([MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ], [1],
+             [Define if the mbrtoc32 function may signal encoding errors in the C locale.])
+           REPLACE_MBRTOC32=1
+           ;;
+      esac
+    fi
+  fi
+])
+
+AC_DEFUN([gl_MBRTOC32_EMPTY_INPUT],
+[
+  AC_REQUIRE([AC_PROG_CC])
+  AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles
+  AC_CACHE_CHECK([whether mbrtoc32 works on empty input],
+    [gl_cv_func_mbrtoc32_empty_input],
+    [
+      dnl Initial guess, used when cross-compiling or when no suitable locale
+      dnl is present.
+changequote(,)dnl
+      case "$host_os" in
+                       # Guess no on glibc systems.
+        *-gnu* | gnu*) gl_cv_func_mbrtoc32_empty_input="guessing no" ;;
+        *)             gl_cv_func_mbrtoc32_empty_input="guessing yes" ;;
+      esac
+changequote([,])dnl
+      AC_RUN_IFELSE(
+        [AC_LANG_SOURCE([[
+           #include <uchar.h>
+           static char32_t wc;
+           static mbstate_t mbs;
+           int
+           main (void)
+           {
+             return mbrtoc32 (&wc, "", 0, &mbs) != (size_t) -2;
+           }]])],
+        [gl_cv_func_mbrtoc32_empty_input=yes],
+        [gl_cv_func_mbrtoc32_empty_input=no],
+        [:])
+    ])
+])
+
+AC_DEFUN([gl_MBRTOC32_C_LOCALE],
+[
+  AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles
+  AC_CACHE_CHECK([whether the C locale is free of encoding errors],
+    [gl_cv_func_mbrtoc32_C_locale_sans_EILSEQ],
+    [
+     dnl Initial guess, used when cross-compiling or when no suitable locale
+     dnl is present.
+     gl_cv_func_mbrtoc32_C_locale_sans_EILSEQ="$gl_cross_guess_normal"
+
+     AC_RUN_IFELSE(
+       [AC_LANG_PROGRAM(
+          [[#include <limits.h>
+            #include <locale.h>
+            #include <uchar.h>
+          ]], [[
+            int i;
+            char *locale = setlocale (LC_ALL, "C");
+            if (! locale)
+              return 2;
+            for (i = CHAR_MIN; i <= CHAR_MAX; i++)
+              {
+                char c = i;
+                char32_t wc;
+                mbstate_t mbs = { 0, };
+                size_t ss = mbrtoc32 (&wc, &c, 1, &mbs);
+                if (1 < ss)
+                  return 3;
+              }
+            return 0;
+          ]])],
+      [gl_cv_func_mbrtoc32_C_locale_sans_EILSEQ=yes],
+      [gl_cv_func_mbrtoc32_C_locale_sans_EILSEQ=no],
+      [case "$host_os" in
+                 # Guess yes on native Windows.
+         mingw*) gl_cv_func_mbrtoc32_C_locale_sans_EILSEQ="guessing yes" ;;
+       esac
+      ])
+    ])
+])
+
+# Prerequisites of lib/mbrtoc32.c and lib/lc-charset-dispatch.c.
+AC_DEFUN([gl_PREREQ_MBRTOC32], [
+  :
+])
diff --git a/m4/uchar.m4 b/m4/uchar.m4
index c5a3594..4d5f046 100644
--- a/m4/uchar.m4
+++ b/m4/uchar.m4
@@ -1,4 +1,4 @@
-# uchar.m4 serial 2
+# uchar.m4 serial 3
 dnl Copyright (C) 2019-2020 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -18,6 +18,12 @@ AC_DEFUN_ONCE([gl_UCHAR_H],
     HAVE_UCHAR_H=0
   fi
   AC_SUBST([HAVE_UCHAR_H])
+
+  dnl Check for declarations of anything we want to poison if the
+  dnl corresponding gnulib module is not in use, and which is not
+  dnl guaranteed by C11.
+  gl_WARN_ON_USE_PREPARE([[#include <uchar.h>
+    ]], [mbrtoc32])
 ])
 
 AC_DEFUN([gl_UCHAR_MODULE_INDICATOR],
@@ -32,4 +38,8 @@ AC_DEFUN([gl_UCHAR_MODULE_INDICATOR],
 AC_DEFUN([gl_UCHAR_H_DEFAULTS],
 [
   GNULIB_C32TOB=0;           AC_SUBST([GNULIB_C32TOB])
+  GNULIB_MBRTOC32=0;         AC_SUBST([GNULIB_MBRTOC32])
+  dnl Assume proper GNU behavior unless another module says otherwise.
+  HAVE_MBRTOC32=1;           AC_SUBST([HAVE_MBRTOC32])
+  REPLACE_MBRTOC32=0;        AC_SUBST([REPLACE_MBRTOC32])
 ])
diff --git a/modules/mbrtoc32 b/modules/mbrtoc32
new file mode 100644
index 0000000..011b7a9
--- /dev/null
+++ b/modules/mbrtoc32
@@ -0,0 +1,51 @@
+Description:
+mbrtoc32() function: convert multibyte character to 32-bit wide character.
+
+Files:
+lib/mbrtoc32.c
+lib/mbrtowc-impl.h
+lib/mbrtowc-impl-utf8.h
+lib/lc-charset-dispatch.h
+lib/lc-charset-dispatch.c
+lib/mbtowc-lock.h
+lib/mbtowc-lock.c
+lib/windows-initguard.h
+m4/mbrtoc32.m4
+m4/mbrtowc.m4
+m4/mbstate_t.m4
+m4/threadlib.m4
+m4/visibility.m4
+
+Depends-on:
+uchar
+hard-locale     [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1]
+mbrtowc         [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1]
+mbsinit         [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1]
+localcharset    [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1]
+streq           [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1]
+verify          [test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1]
+
+configure.ac:
+gl_FUNC_MBRTOC32
+if test $HAVE_MBRTOC32 = 0 || test $REPLACE_MBRTOC32 = 1; then
+  AC_LIBOBJ([mbrtoc32])
+  AC_LIBOBJ([lc-charset-dispatch])
+  AC_LIBOBJ([mbtowc-lock])
+  gl_PREREQ_MBRTOC32
+  gl_PREREQ_MBTOWC_LOCK
+fi
+gl_UCHAR_MODULE_INDICATOR([mbrtoc32])
+
+Makefile.am:
+
+Include:
+<uchar.h>
+
+Link:
+$(LIB_MBRTOWC)
+
+License:
+LGPLv2+
+
+Maintainer:
+Bruno Haible
diff --git a/modules/uchar b/modules/uchar
index 67a8866..165fae6 100644
--- a/modules/uchar
+++ b/modules/uchar
@@ -27,6 +27,9 @@ uchar.h: uchar.in.h $(top_builddir)/config.status $(CXXDEFS_H)
 	      -e 's|@''PRAGMA_COLUMNS''@|@PRAGMA_COLUMNS@|g' \
 	      -e 's|@''NEXT_UCHAR_H''@|$(NEXT_UCHAR_H)|g' \
 	      -e 's/@''GNULIB_C32TOB''@/$(GNULIB_C32TOB)/g' \
+	      -e 's/@''GNULIB_MBRTOC32''@/$(GNULIB_MBRTOC32)/g' \
+	      -e 's|@''HAVE_MBRTOC32''@|$(HAVE_MBRTOC32)|g' \
+	      -e 's|@''REPLACE_MBRTOC32''@|$(REPLACE_MBRTOC32)|g' \
 	      -e '/definitions of _GL_FUNCDECL_RPL/r $(CXXDEFS_H)' \
 	      < $(srcdir)/uchar.in.h; \
 	} > $@-t && \
diff --git a/modules/uchar-c++-tests b/modules/uchar-c++-tests
index 69058e3..4f179f0 100644
--- a/modules/uchar-c++-tests
+++ b/modules/uchar-c++-tests
@@ -16,4 +16,5 @@ if ANSICXX
 TESTS += test-uchar-c++
 check_PROGRAMS += test-uchar-c++
 test_uchar_c___SOURCES = test-uchar-c++.cc test-uchar-c++2.cc
+test_uchar_c___LDADD = $(LDADD) $(LIB_MBRTOWC)
 endif
diff --git a/tests/test-uchar-c++.cc b/tests/test-uchar-c++.cc
index 9a11a13..392b104 100644
--- a/tests/test-uchar-c++.cc
+++ b/tests/test-uchar-c++.cc
@@ -28,6 +28,11 @@
 SIGNATURE_CHECK (GNULIB_NAMESPACE::c32tob, int, (wint_t));
 #endif
 
+#if GNULIB_TEST_MBRTOC32
+SIGNATURE_CHECK (GNULIB_NAMESPACE::mbrtoc32, size_t,
+                 (char32_t *, const char *, size_t, mbstate_t *));
+#endif
+
 
 int
 main ()
-- 
2.7.4

From dae399e532230b153356ab5e5f8016936377cb1e Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Sat, 4 Jan 2020 02:35:26 +0100
Subject: [PATCH 5/5] mbrtoc32: Add tests.

* tests/test-mbrtoc32.c: New file, based on tests/test-mbrtowc.c.
* tests/test-mbrtoc32-1.sh: New file, based on tests/test-mbrtowc1.sh.
* tests/test-mbrtoc32-2.sh: New file, based on tests/test-mbrtowc2.sh.
* tests/test-mbrtoc32-3.sh: New file, based on tests/test-mbrtowc3.sh.
* tests/test-mbrtoc32-4.sh: New file, based on tests/test-mbrtowc4.sh.
* tests/test-mbrtoc32-5.sh: New file, based on tests/test-mbrtowc5.sh.
* tests/test-mbrtoc32-w32.c: New file, based on tests/test-mbrtowc-w32.c.
* tests/test-mbrtoc32-w32-1.sh: New file, based on
tests/test-mbrtowc-w32-1.sh.
* tests/test-mbrtoc32-w32-2.sh: New file, based on
tests/test-mbrtowc-w32-2.sh.
* tests/test-mbrtoc32-w32-3.sh: New file, based on
tests/test-mbrtowc-w32-3.sh.
* tests/test-mbrtoc32-w32-4.sh: New file, based on
tests/test-mbrtowc-w32-4.sh.
* tests/test-mbrtoc32-w32-5.sh: New file, based on
tests/test-mbrtowc-w32-5.sh.
* tests/test-mbrtoc32-w32-6.sh: New file, based on
tests/test-mbrtowc-w32-6.sh.
* tests/test-mbrtoc32-w32-7.sh: New file, based on
tests/test-mbrtowc-w32-7.sh.
* modules/mbrtoc32-tests: New file, based on modules/mbrtowc-tests.
---
 ChangeLog                    |  24 ++
 modules/mbrtoc32-tests       |  48 +++
 tests/test-mbrtoc32-1.sh     |  15 +
 tests/test-mbrtoc32-2.sh     |  15 +
 tests/test-mbrtoc32-3.sh     |  15 +
 tests/test-mbrtoc32-4.sh     |  15 +
 tests/test-mbrtoc32-5.sh     |   6 +
 tests/test-mbrtoc32-w32-1.sh |   4 +
 tests/test-mbrtoc32-w32-2.sh |   4 +
 tests/test-mbrtoc32-w32-3.sh |   4 +
 tests/test-mbrtoc32-w32-4.sh |   4 +
 tests/test-mbrtoc32-w32-5.sh |   4 +
 tests/test-mbrtoc32-w32-6.sh |   4 +
 tests/test-mbrtoc32-w32-7.sh |   4 +
 tests/test-mbrtoc32-w32.c    | 752 +++++++++++++++++++++++++++++++++++++++++++
 tests/test-mbrtoc32.c        | 376 ++++++++++++++++++++++
 16 files changed, 1294 insertions(+)
 create mode 100644 modules/mbrtoc32-tests
 create mode 100755 tests/test-mbrtoc32-1.sh
 create mode 100755 tests/test-mbrtoc32-2.sh
 create mode 100755 tests/test-mbrtoc32-3.sh
 create mode 100755 tests/test-mbrtoc32-4.sh
 create mode 100755 tests/test-mbrtoc32-5.sh
 create mode 100755 tests/test-mbrtoc32-w32-1.sh
 create mode 100755 tests/test-mbrtoc32-w32-2.sh
 create mode 100755 tests/test-mbrtoc32-w32-3.sh
 create mode 100755 tests/test-mbrtoc32-w32-4.sh
 create mode 100755 tests/test-mbrtoc32-w32-5.sh
 create mode 100755 tests/test-mbrtoc32-w32-6.sh
 create mode 100755 tests/test-mbrtoc32-w32-7.sh
 create mode 100644 tests/test-mbrtoc32-w32.c
 create mode 100644 tests/test-mbrtoc32.c

diff --git a/ChangeLog b/ChangeLog
index c4d0968..63f5799 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,29 @@
 2020-01-03  Bruno Haible  <br...@clisp.org>
 
+	mbrtoc32: Add tests.
+	* tests/test-mbrtoc32.c: New file, based on tests/test-mbrtowc.c.
+	* tests/test-mbrtoc32-1.sh: New file, based on tests/test-mbrtowc1.sh.
+	* tests/test-mbrtoc32-2.sh: New file, based on tests/test-mbrtowc2.sh.
+	* tests/test-mbrtoc32-3.sh: New file, based on tests/test-mbrtowc3.sh.
+	* tests/test-mbrtoc32-4.sh: New file, based on tests/test-mbrtowc4.sh.
+	* tests/test-mbrtoc32-5.sh: New file, based on tests/test-mbrtowc5.sh.
+	* tests/test-mbrtoc32-w32.c: New file, based on tests/test-mbrtowc-w32.c.
+	* tests/test-mbrtoc32-w32-1.sh: New file, based on
+	tests/test-mbrtowc-w32-1.sh.
+	* tests/test-mbrtoc32-w32-2.sh: New file, based on
+	tests/test-mbrtowc-w32-2.sh.
+	* tests/test-mbrtoc32-w32-3.sh: New file, based on
+	tests/test-mbrtowc-w32-3.sh.
+	* tests/test-mbrtoc32-w32-4.sh: New file, based on
+	tests/test-mbrtowc-w32-4.sh.
+	* tests/test-mbrtoc32-w32-5.sh: New file, based on
+	tests/test-mbrtowc-w32-5.sh.
+	* tests/test-mbrtoc32-w32-6.sh: New file, based on
+	tests/test-mbrtowc-w32-6.sh.
+	* tests/test-mbrtoc32-w32-7.sh: New file, based on
+	tests/test-mbrtowc-w32-7.sh.
+	* modules/mbrtoc32-tests: New file, based on modules/mbrtowc-tests.
+
 	mbrtoc32: New module.
 	* lib/uchar.in.h (mbrtoc32): New declaration.
 	* lib/mbrtoc32.c: New file, based on lib/mbrtowc.c.
diff --git a/modules/mbrtoc32-tests b/modules/mbrtoc32-tests
new file mode 100644
index 0000000..d4c03f2
--- /dev/null
+++ b/modules/mbrtoc32-tests
@@ -0,0 +1,48 @@
+Files:
+tests/test-mbrtoc32-1.sh
+tests/test-mbrtoc32-2.sh
+tests/test-mbrtoc32-3.sh
+tests/test-mbrtoc32-4.sh
+tests/test-mbrtoc32-5.sh
+tests/test-mbrtoc32.c
+tests/test-mbrtoc32-w32-1.sh
+tests/test-mbrtoc32-w32-2.sh
+tests/test-mbrtoc32-w32-3.sh
+tests/test-mbrtoc32-w32-4.sh
+tests/test-mbrtoc32-w32-5.sh
+tests/test-mbrtoc32-w32-6.sh
+tests/test-mbrtoc32-w32-7.sh
+tests/test-mbrtoc32-w32.c
+tests/signature.h
+tests/macros.h
+m4/locale-fr.m4
+m4/locale-ja.m4
+m4/locale-zh.m4
+m4/codeset.m4
+
+Depends-on:
+mbsinit
+c32tob
+setlocale
+localcharset
+
+configure.ac:
+gt_LOCALE_FR
+gt_LOCALE_FR_UTF8
+gt_LOCALE_JA
+gt_LOCALE_ZH_CN
+
+Makefile.am:
+TESTS += \
+  test-mbrtoc32-1.sh test-mbrtoc32-2.sh test-mbrtoc32-3.sh test-mbrtoc32-4.sh \
+  test-mbrtoc32-5.sh \
+  test-mbrtoc32-w32-1.sh test-mbrtoc32-w32-2.sh test-mbrtoc32-w32-3.sh \
+  test-mbrtoc32-w32-4.sh test-mbrtoc32-w32-5.sh test-mbrtoc32-w32-6.sh \
+  test-mbrtoc32-w32-7.sh
+TESTS_ENVIRONMENT += \
+  LOCALE_FR='@LOCALE_FR@' \
+  LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \
+  LOCALE_JA='@LOCALE_JA@' \
+  LOCALE_ZH_CN='@LOCALE_ZH_CN@'
+check_PROGRAMS += test-mbrtoc32 test-mbrtoc32-w32
+test_mbrtoc32_LDADD = $(LDADD) $(LIB_SETLOCALE) $(LIB_MBRTOWC)
diff --git a/tests/test-mbrtoc32-1.sh b/tests/test-mbrtoc32-1.sh
new file mode 100755
index 0000000..d22a290
--- /dev/null
+++ b/tests/test-mbrtoc32-1.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+# Test in an ISO-8859-1 or ISO-8859-15 locale.
+: ${LOCALE_FR=fr_FR}
+if test $LOCALE_FR = none; then
+  if test -f /usr/bin/localedef; then
+    echo "Skipping test: no traditional french locale is installed"
+  else
+    echo "Skipping test: no traditional french locale is supported"
+  fi
+  exit 77
+fi
+
+LC_ALL=$LOCALE_FR \
+${CHECKER} ./test-mbrtoc32${EXEEXT} 1
diff --git a/tests/test-mbrtoc32-2.sh b/tests/test-mbrtoc32-2.sh
new file mode 100755
index 0000000..3b429db
--- /dev/null
+++ b/tests/test-mbrtoc32-2.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+# Test whether a specific UTF-8 locale is installed.
+: ${LOCALE_FR_UTF8=fr_FR.UTF-8}
+if test $LOCALE_FR_UTF8 = none; then
+  if test -f /usr/bin/localedef; then
+    echo "Skipping test: no french Unicode locale is installed"
+  else
+    echo "Skipping test: no french Unicode locale is supported"
+  fi
+  exit 77
+fi
+
+LC_ALL=$LOCALE_FR_UTF8 \
+${CHECKER} ./test-mbrtoc32${EXEEXT} 2
diff --git a/tests/test-mbrtoc32-3.sh b/tests/test-mbrtoc32-3.sh
new file mode 100755
index 0000000..2d79c56
--- /dev/null
+++ b/tests/test-mbrtoc32-3.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+# Test whether a specific EUC-JP locale is installed.
+: ${LOCALE_JA=ja_JP}
+if test $LOCALE_JA = none; then
+  if test -f /usr/bin/localedef; then
+    echo "Skipping test: no traditional japanese locale is installed"
+  else
+    echo "Skipping test: no traditional japanese locale is supported"
+  fi
+  exit 77
+fi
+
+LC_ALL=$LOCALE_JA \
+${CHECKER} ./test-mbrtoc32${EXEEXT} 3
diff --git a/tests/test-mbrtoc32-4.sh b/tests/test-mbrtoc32-4.sh
new file mode 100755
index 0000000..05e617f
--- /dev/null
+++ b/tests/test-mbrtoc32-4.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+# Test whether a specific GB18030 locale is installed.
+: ${LOCALE_ZH_CN=zh_CN.GB18030}
+if test $LOCALE_ZH_CN = none; then
+  if test -f /usr/bin/localedef; then
+    echo "Skipping test: no transitional chinese locale is installed"
+  else
+    echo "Skipping test: no transitional chinese locale is supported"
+  fi
+  exit 77
+fi
+
+LC_ALL=$LOCALE_ZH_CN \
+${CHECKER} ./test-mbrtoc32${EXEEXT} 4
diff --git a/tests/test-mbrtoc32-5.sh b/tests/test-mbrtoc32-5.sh
new file mode 100755
index 0000000..cd000fb
--- /dev/null
+++ b/tests/test-mbrtoc32-5.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+# Test whether the POSIX locale has encoding errors.
+LC_ALL=C \
+${CHECKER} ./test-mbrtoc32${EXEEXT} 5 || exit
+LC_ALL=POSIX \
+${CHECKER} ./test-mbrtoc32${EXEEXT} 5
diff --git a/tests/test-mbrtoc32-w32-1.sh b/tests/test-mbrtoc32-w32-1.sh
new file mode 100755
index 0000000..bf6b61c
--- /dev/null
+++ b/tests/test-mbrtoc32-w32-1.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP1252 locale.
+${CHECKER} ./test-mbrtoc32-w32${EXEEXT} French_France 1252
diff --git a/tests/test-mbrtoc32-w32-2.sh b/tests/test-mbrtoc32-w32-2.sh
new file mode 100755
index 0000000..dd96b17
--- /dev/null
+++ b/tests/test-mbrtoc32-w32-2.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP1256 locale.
+${CHECKER} ./test-mbrtoc32-w32${EXEEXT} "Arabic_Saudi Arabia" 1256
diff --git a/tests/test-mbrtoc32-w32-3.sh b/tests/test-mbrtoc32-w32-3.sh
new file mode 100755
index 0000000..21a826b
--- /dev/null
+++ b/tests/test-mbrtoc32-w32-3.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP932 locale.
+${CHECKER} ./test-mbrtoc32-w32${EXEEXT} Japanese_Japan 932
diff --git a/tests/test-mbrtoc32-w32-4.sh b/tests/test-mbrtoc32-w32-4.sh
new file mode 100755
index 0000000..4e261db
--- /dev/null
+++ b/tests/test-mbrtoc32-w32-4.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP950 locale.
+${CHECKER} ./test-mbrtoc32-w32${EXEEXT} Chinese_Taiwan 950
diff --git a/tests/test-mbrtoc32-w32-5.sh b/tests/test-mbrtoc32-w32-5.sh
new file mode 100755
index 0000000..200c248
--- /dev/null
+++ b/tests/test-mbrtoc32-w32-5.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a CP936 locale.
+${CHECKER} ./test-mbrtoc32-w32${EXEEXT} Chinese_China 936
diff --git a/tests/test-mbrtoc32-w32-6.sh b/tests/test-mbrtoc32-w32-6.sh
new file mode 100755
index 0000000..a763e9f
--- /dev/null
+++ b/tests/test-mbrtoc32-w32-6.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test a GB18030 locale.
+${CHECKER} ./test-mbrtoc32-w32${EXEEXT} Chinese_China 54936
diff --git a/tests/test-mbrtoc32-w32-7.sh b/tests/test-mbrtoc32-w32-7.sh
new file mode 100755
index 0000000..b2b889b
--- /dev/null
+++ b/tests/test-mbrtoc32-w32-7.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Test some UTF-8 locales.
+${CHECKER} ./test-mbrtoc32-w32${EXEEXT} French_France Japanese_Japan Chinese_Taiwan Chinese_China 65001
diff --git a/tests/test-mbrtoc32-w32.c b/tests/test-mbrtoc32-w32.c
new file mode 100644
index 0000000..91ca90e
--- /dev/null
+++ b/tests/test-mbrtoc32-w32.c
@@ -0,0 +1,752 @@
+/* Test of conversion of multibyte character to 32-bit wide character.
+   Copyright (C) 2008-2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+
+#include <uchar.h>
+
+#include <errno.h>
+#include <locale.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "localcharset.h"
+#include "macros.h"
+
+#if defined _WIN32 && !defined __CYGWIN__
+
+static int
+test_one_locale (const char *name, int codepage)
+{
+  mbstate_t state;
+  char32_t wc;
+  size_t ret;
+
+# if 1
+  /* Portable code to set the locale.  */
+  {
+    char name_with_codepage[1024];
+
+    sprintf (name_with_codepage, "%s.%d", name, codepage);
+
+    /* Set the locale.  */
+    if (setlocale (LC_ALL, name_with_codepage) == NULL)
+      return 77;
+  }
+# else
+  /* Hacky way to set a locale.codepage combination that setlocale() refuses
+     to set.  */
+  {
+    /* Codepage of the current locale, set with setlocale().
+       Not necessarily the same as GetACP().  */
+    extern __declspec(dllimport) unsigned int __lc_codepage;
+
+    /* Set the locale.  */
+    if (setlocale (LC_ALL, name) == NULL)
+      return 77;
+
+    /* Clobber the codepage and MB_CUR_MAX, both set by setlocale().  */
+    __lc_codepage = codepage;
+    switch (codepage)
+      {
+      case 1252:
+      case 1256:
+        MB_CUR_MAX = 1;
+        break;
+      case 932:
+      case 950:
+      case 936:
+        MB_CUR_MAX = 2;
+        break;
+      case 54936:
+      case 65001:
+        MB_CUR_MAX = 4;
+        break;
+      }
+
+    /* Test whether the codepage is really available.  */
+    memset (&state, '\0', sizeof (mbstate_t));
+    if (mbrtoc32 (&wc, " ", 1, &state) == (size_t)(-1))
+      return 77;
+  }
+# endif
+
+  /* Test zero-length input.  */
+  {
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (char32_t) 0xBADFACE;
+    ret = mbrtoc32 (&wc, "x", 0, &state);
+    /* gnulib's implementation returns (size_t)(-2).
+       The AIX 5.1 implementation returns (size_t)(-1).
+       glibc's implementation returns 0.  */
+    ASSERT (ret == (size_t)(-2) || ret == (size_t)(-1) || ret == 0);
+    ASSERT (mbsinit (&state));
+  }
+
+  /* Test NUL byte input.  */
+  {
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (char32_t) 0xBADFACE;
+    ret = mbrtoc32 (&wc, "", 1, &state);
+    ASSERT (ret == 0);
+    ASSERT (wc == 0);
+    ASSERT (mbsinit (&state));
+    ret = mbrtoc32 (NULL, "", 1, &state);
+    ASSERT (ret == 0);
+    ASSERT (mbsinit (&state));
+  }
+
+  /* Test single-byte input.  */
+  {
+    int c;
+    char buf[1];
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    for (c = 0; c < 0x100; c++)
+      switch (c)
+        {
+        case '\t': case '\v': case '\f':
+        case ' ': case '!': case '"': case '#': case '%':
+        case '&': case '\'': case '(': case ')': case '*':
+        case '+': case ',': case '-': case '.': case '/':
+        case '0': case '1': case '2': case '3': case '4':
+        case '5': case '6': case '7': case '8': case '9':
+        case ':': case ';': case '<': case '=': case '>':
+        case '?':
+        case 'A': case 'B': case 'C': case 'D': case 'E':
+        case 'F': case 'G': case 'H': case 'I': case 'J':
+        case 'K': case 'L': case 'M': case 'N': case 'O':
+        case 'P': case 'Q': case 'R': case 'S': case 'T':
+        case 'U': case 'V': case 'W': case 'X': case 'Y':
+        case 'Z':
+        case '[': case '\\': case ']': case '^': case '_':
+        case 'a': case 'b': case 'c': case 'd': case 'e':
+        case 'f': case 'g': case 'h': case 'i': case 'j':
+        case 'k': case 'l': case 'm': case 'n': case 'o':
+        case 'p': case 'q': case 'r': case 's': case 't':
+        case 'u': case 'v': case 'w': case 'x': case 'y':
+        case 'z': case '{': case '|': case '}': case '~':
+          /* c is in the ISO C "basic character set".  */
+          buf[0] = c;
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, buf, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == c);
+          ASSERT (mbsinit (&state));
+          ret = mbrtoc32 (NULL, buf, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (mbsinit (&state));
+          break;
+        }
+  }
+
+  /* Test special calling convention, passing a NULL pointer.  */
+  {
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (char32_t) 0xBADFACE;
+    ret = mbrtoc32 (&wc, NULL, 5, &state);
+    ASSERT (ret == 0);
+    ASSERT (wc == (char32_t) 0xBADFACE);
+    ASSERT (mbsinit (&state));
+  }
+
+  switch (codepage)
+    {
+    case 1252:
+      /* Locale encoding is CP1252, an extension of ISO-8859-1.  */
+      {
+        char input[] = "B\374\337er"; /* "Büßer" */
+        memset (&state, '\0', sizeof (mbstate_t));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == 'B');
+        ASSERT (mbsinit (&state));
+        input[0] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 1, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == (unsigned char) '\374');
+        ASSERT (wc == 0x00FC);
+        ASSERT (mbsinit (&state));
+        input[1] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 2, 3, &state);
+        ASSERT (ret == 1);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 2, 3, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == (unsigned char) '\337');
+        ASSERT (wc == 0x00DF);
+        ASSERT (mbsinit (&state));
+        input[2] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 3, 2, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == 'e');
+        ASSERT (mbsinit (&state));
+        input[3] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 4, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == 'r');
+        ASSERT (mbsinit (&state));
+      }
+      return 0;
+
+    case 1256:
+      /* Locale encoding is CP1256, not the same as ISO-8859-6.  */
+      {
+        char input[] = "x\302\341\346y"; /* "xآلوy" */
+        memset (&state, '\0', sizeof (mbstate_t));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == 'x');
+        ASSERT (mbsinit (&state));
+        input[0] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 1, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == (unsigned char) '\302');
+        ASSERT (wc == 0x0622);
+        ASSERT (mbsinit (&state));
+        input[1] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 2, 3, &state);
+        ASSERT (ret == 1);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 2, 3, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == (unsigned char) '\341');
+        ASSERT (wc == 0x0644);
+        ASSERT (mbsinit (&state));
+        input[2] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 3, 2, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == (unsigned char) '\346');
+        ASSERT (wc == 0x0648);
+        ASSERT (mbsinit (&state));
+        input[3] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 4, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == 'y');
+        ASSERT (mbsinit (&state));
+      }
+      return 0;
+
+    case 932:
+      /* Locale encoding is CP932, similar to Shift_JIS.  */
+      {
+        char input[] = "<\223\372\226\173\214\352>"; /* "<日本語>" */
+        memset (&state, '\0', sizeof (mbstate_t));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == '<');
+        ASSERT (mbsinit (&state));
+        input[0] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 1, 2, &state);
+        ASSERT (ret == 2);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x65E5);
+        ASSERT (mbsinit (&state));
+        input[1] = '\0';
+        input[2] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 3, 1, &state);
+        ASSERT (ret == (size_t)(-2));
+        ASSERT (wc == (char32_t) 0xBADFACE);
+        ASSERT (!mbsinit (&state));
+        input[3] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 4, 4, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x672C);
+        ASSERT (mbsinit (&state));
+        input[4] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 5, 3, &state);
+        ASSERT (ret == 2);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 5, 3, &state);
+        ASSERT (ret == 2);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x8A9E);
+        ASSERT (mbsinit (&state));
+        input[5] = '\0';
+        input[6] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 7, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == '>');
+        ASSERT (mbsinit (&state));
+
+        /* Test some invalid input.  */
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\377", 1, &state); /* 0xFF */
+        ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\225\377", 2, &state); /* 0x95 0xFF */
+        ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || (ret == 2 && wc == 0x30FB));
+      }
+      return 0;
+
+    case 950:
+      /* Locale encoding is CP950, similar to Big5.  */
+      {
+        char input[] = "<\244\351\245\273\273\171>"; /* "<日本語>" */
+        memset (&state, '\0', sizeof (mbstate_t));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == '<');
+        ASSERT (mbsinit (&state));
+        input[0] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 1, 2, &state);
+        ASSERT (ret == 2);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x65E5);
+        ASSERT (mbsinit (&state));
+        input[1] = '\0';
+        input[2] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 3, 1, &state);
+        ASSERT (ret == (size_t)(-2));
+        ASSERT (wc == (char32_t) 0xBADFACE);
+        ASSERT (!mbsinit (&state));
+        input[3] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 4, 4, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x672C);
+        ASSERT (mbsinit (&state));
+        input[4] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 5, 3, &state);
+        ASSERT (ret == 2);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 5, 3, &state);
+        ASSERT (ret == 2);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x8A9E);
+        ASSERT (mbsinit (&state));
+        input[5] = '\0';
+        input[6] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 7, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == '>');
+        ASSERT (mbsinit (&state));
+
+        /* Test some invalid input.  */
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\377", 1, &state); /* 0xFF */
+        ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\225\377", 2, &state); /* 0x95 0xFF */
+        ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || (ret == 2 && wc == '?'));
+      }
+      return 0;
+
+    case 936:
+      /* Locale encoding is CP936 = GBK, an extension of GB2312.  */
+      {
+        char input[] = "<\310\325\261\276\325\132>"; /* "<日本語>" */
+        memset (&state, '\0', sizeof (mbstate_t));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == '<');
+        ASSERT (mbsinit (&state));
+        input[0] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 1, 2, &state);
+        ASSERT (ret == 2);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x65E5);
+        ASSERT (mbsinit (&state));
+        input[1] = '\0';
+        input[2] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 3, 1, &state);
+        ASSERT (ret == (size_t)(-2));
+        ASSERT (wc == (char32_t) 0xBADFACE);
+        ASSERT (!mbsinit (&state));
+        input[3] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 4, 4, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x672C);
+        ASSERT (mbsinit (&state));
+        input[4] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 5, 3, &state);
+        ASSERT (ret == 2);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 5, 3, &state);
+        ASSERT (ret == 2);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x8A9E);
+        ASSERT (mbsinit (&state));
+        input[5] = '\0';
+        input[6] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 7, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == '>');
+        ASSERT (mbsinit (&state));
+
+        /* Test some invalid input.  */
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\377", 1, &state); /* 0xFF */
+        ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || ret == (size_t)-2);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\225\377", 2, &state); /* 0x95 0xFF */
+        ASSERT ((ret == (size_t)-1 && errno == EILSEQ) || (ret == 2 && wc == '?'));
+      }
+      return 0;
+
+    case 54936:
+      /* Locale encoding is CP54936 = GB18030.  */
+      if (strcmp (locale_charset (), "GB18030") != 0)
+        return 77;
+      {
+        char input[] = "s\250\271\201\060\211\070\224\071\375\067!"; /* "süß😋!" */
+        memset (&state, '\0', sizeof (mbstate_t));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == 's');
+        ASSERT (mbsinit (&state));
+        input[0] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 1, 1, &state);
+        ASSERT (ret == (size_t)(-2));
+        ASSERT (wc == (char32_t) 0xBADFACE);
+        ASSERT (!mbsinit (&state));
+        input[1] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 2, 9, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x00FC);
+        ASSERT (mbsinit (&state));
+        input[2] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 3, 8, &state);
+        ASSERT (ret == 4);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 3, 8, &state);
+        ASSERT (ret == 4);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x00DF);
+        ASSERT (mbsinit (&state));
+        input[3] = '\0';
+        input[4] = '\0';
+        input[5] = '\0';
+        input[6] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 7, 4, &state);
+        ASSERT (ret == 4);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 7, 4, &state);
+        ASSERT (ret == 4);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x1F60B);
+        ASSERT (mbsinit (&state));
+        input[7] = '\0';
+        input[8] = '\0';
+        input[9] = '\0';
+        input[10] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 11, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == '!');
+        ASSERT (mbsinit (&state));
+
+        /* Test some invalid input.  */
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\377", 1, &state); /* 0xFF */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\225\377", 2, &state); /* 0x95 0xFF */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\201\045", 2, &state); /* 0x81 0x25 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\201\060\377", 3, &state); /* 0x81 0x30 0xFF */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\201\060\377\064", 4, &state); /* 0x81 0x30 0xFF 0x34 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\201\060\211\072", 4, &state); /* 0x81 0x30 0x89 0x3A */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+      }
+      return 0;
+
+    case 65001:
+      /* Locale encoding is CP65001 = UTF-8.  */
+      if (strcmp (locale_charset (), "UTF-8") != 0)
+        return 77;
+      {
+        char input[] = "s\303\274\303\237\360\237\230\213!"; /* "süß😋!" */
+        memset (&state, '\0', sizeof (mbstate_t));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == 's');
+        ASSERT (mbsinit (&state));
+        input[0] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 1, 1, &state);
+        ASSERT (ret == (size_t)(-2));
+        ASSERT (wc == (char32_t) 0xBADFACE);
+        ASSERT (!mbsinit (&state));
+        input[1] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 2, 7, &state);
+        ASSERT (ret == 1);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x00FC);
+        ASSERT (mbsinit (&state));
+        input[2] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 3, 6, &state);
+        ASSERT (ret == 2);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 3, 6, &state);
+        ASSERT (ret == 2);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x00DF);
+        ASSERT (mbsinit (&state));
+        input[3] = '\0';
+        input[4] = '\0';
+
+        /* Test support of NULL first argument.  */
+        ret = mbrtoc32 (NULL, input + 5, 4, &state);
+        ASSERT (ret == 4);
+        ASSERT (mbsinit (&state));
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 5, 4, &state);
+        ASSERT (ret == 4);
+        ASSERT (c32tob (wc) == EOF);
+        ASSERT (wc == 0x1F60B);
+        ASSERT (mbsinit (&state));
+        input[5] = '\0';
+        input[6] = '\0';
+        input[7] = '\0';
+        input[8] = '\0';
+
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, input + 9, 1, &state);
+        ASSERT (ret == 1);
+        ASSERT (wc == '!');
+        ASSERT (mbsinit (&state));
+
+        /* Test some invalid input.  */
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\377", 1, &state); /* 0xFF */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\303\300", 2, &state); /* 0xC3 0xC0 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\343\300", 2, &state); /* 0xE3 0xC0 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\343\300\200", 3, &state); /* 0xE3 0xC0 0x80 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\343\200\300", 3, &state); /* 0xE3 0x80 0xC0 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\363\300", 2, &state); /* 0xF3 0xC0 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\363\300\200\200", 4, &state); /* 0xF3 0xC0 0x80 0x80 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\363\200\300", 3, &state); /* 0xF3 0x80 0xC0 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\363\200\300\200", 4, &state); /* 0xF3 0x80 0xC0 0x80 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+
+        memset (&state, '\0', sizeof (mbstate_t));
+        wc = (char32_t) 0xBADFACE;
+        ret = mbrtoc32 (&wc, "\363\200\200\300", 4, &state); /* 0xF3 0x80 0x80 0xC0 */
+        ASSERT (ret == (size_t)-1);
+        ASSERT (errno == EILSEQ);
+      }
+      return 0;
+
+    default:
+      return 1;
+    }
+}
+
+int
+main (int argc, char *argv[])
+{
+  int codepage = atoi (argv[argc - 1]);
+  int result;
+  int i;
+
+  result = 77;
+  for (i = 1; i < argc - 1; i++)
+    {
+      int ret = test_one_locale (argv[i], codepage);
+
+      if (ret != 77)
+        result = ret;
+    }
+
+  if (result == 77)
+    {
+      fprintf (stderr, "Skipping test: found no locale with codepage %d\n",
+               codepage);
+    }
+  return result;
+}
+
+#else
+
+int
+main (int argc, char *argv[])
+{
+  fputs ("Skipping test: not a native Windows system\n", stderr);
+  return 77;
+}
+
+#endif
diff --git a/tests/test-mbrtoc32.c b/tests/test-mbrtoc32.c
new file mode 100644
index 0000000..04501c7
--- /dev/null
+++ b/tests/test-mbrtoc32.c
@@ -0,0 +1,376 @@
+/* Test of conversion of multibyte character to 32-bit wide character.
+   Copyright (C) 2008-2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2008.  */
+
+#include <config.h>
+
+#include <uchar.h>
+
+#include "signature.h"
+SIGNATURE_CHECK (mbrtoc32, size_t, (char32_t *, char const *, size_t,
+                                    mbstate_t *));
+
+#include <locale.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "macros.h"
+
+int
+main (int argc, char *argv[])
+{
+  mbstate_t state;
+  char32_t wc;
+  size_t ret;
+
+  /* configure should already have checked that the locale is supported.  */
+  if (setlocale (LC_ALL, "") == NULL)
+    return 1;
+
+  /* Test zero-length input.  */
+  {
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (char32_t) 0xBADFACE;
+    ret = mbrtoc32 (&wc, "x", 0, &state);
+    ASSERT (ret == (size_t)(-2));
+    ASSERT (mbsinit (&state));
+  }
+
+  /* Test NUL byte input.  */
+  {
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (char32_t) 0xBADFACE;
+    ret = mbrtoc32 (&wc, "", 1, &state);
+    ASSERT (ret == 0);
+    ASSERT (wc == 0);
+    ASSERT (mbsinit (&state));
+    ret = mbrtoc32 (NULL, "", 1, &state);
+    ASSERT (ret == 0);
+    ASSERT (mbsinit (&state));
+  }
+
+  /* Test single-byte input.  */
+  {
+    int c;
+    char buf[1];
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    for (c = 0; c < 0x100; c++)
+      switch (c)
+        {
+        default:
+          if (! (c && 1 < argc && argv[1][0] == '5'))
+            break;
+          FALLTHROUGH;
+        case '\t': case '\v': case '\f':
+        case ' ': case '!': case '"': case '#': case '%':
+        case '&': case '\'': case '(': case ')': case '*':
+        case '+': case ',': case '-': case '.': case '/':
+        case '0': case '1': case '2': case '3': case '4':
+        case '5': case '6': case '7': case '8': case '9':
+        case ':': case ';': case '<': case '=': case '>':
+        case '?':
+        case 'A': case 'B': case 'C': case 'D': case 'E':
+        case 'F': case 'G': case 'H': case 'I': case 'J':
+        case 'K': case 'L': case 'M': case 'N': case 'O':
+        case 'P': case 'Q': case 'R': case 'S': case 'T':
+        case 'U': case 'V': case 'W': case 'X': case 'Y':
+        case 'Z':
+        case '[': case '\\': case ']': case '^': case '_':
+        case 'a': case 'b': case 'c': case 'd': case 'e':
+        case 'f': case 'g': case 'h': case 'i': case 'j':
+        case 'k': case 'l': case 'm': case 'n': case 'o':
+        case 'p': case 'q': case 'r': case 's': case 't':
+        case 'u': case 'v': case 'w': case 'x': case 'y':
+        case 'z': case '{': case '|': case '}': case '~':
+          /* c is in the ISO C "basic character set", or argv[1] starts
+             with '5' so we are testing all nonnull bytes.  */
+          buf[0] = c;
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, buf, 1, &state);
+          ASSERT (ret == 1);
+          if (c < 0x80)
+            /* c is an ASCII character.  */
+            ASSERT (wc == c);
+          else
+            /* argv[1] starts with '5', that is, we are testing the C or POSIX
+               locale.
+               On most platforms, the bytes 0x80..0xFF map to U+0080..U+00FF.
+               But on musl libc, the bytes 0x80..0xFF map to U+DF80..U+DFFF.  */
+            ASSERT (wc == (btowc (c) == 0xDF00 + c ? btowc (c) : c));
+          ASSERT (mbsinit (&state));
+          ret = mbrtoc32 (NULL, buf, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (mbsinit (&state));
+          break;
+        }
+  }
+
+  /* Test special calling convention, passing a NULL pointer.  */
+  {
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (char32_t) 0xBADFACE;
+    ret = mbrtoc32 (&wc, NULL, 5, &state);
+    ASSERT (ret == 0);
+    ASSERT (wc == (char32_t) 0xBADFACE);
+    ASSERT (mbsinit (&state));
+  }
+
+  if (argc > 1)
+    switch (argv[1][0])
+      {
+      case '1':
+        /* Locale encoding is ISO-8859-1 or ISO-8859-15.  */
+        {
+          char input[] = "B\374\337er"; /* "Büßer" */
+          memset (&state, '\0', sizeof (mbstate_t));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == 'B');
+          ASSERT (mbsinit (&state));
+          input[0] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 1, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (c32tob (wc) == (unsigned char) '\374');
+          ASSERT (mbsinit (&state));
+          input[1] = '\0';
+
+          /* Test support of NULL first argument.  */
+          ret = mbrtoc32 (NULL, input + 2, 3, &state);
+          ASSERT (ret == 1);
+          ASSERT (mbsinit (&state));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 2, 3, &state);
+          ASSERT (ret == 1);
+          ASSERT (c32tob (wc) == (unsigned char) '\337');
+          ASSERT (mbsinit (&state));
+          input[2] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 3, 2, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == 'e');
+          ASSERT (mbsinit (&state));
+          input[3] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 4, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == 'r');
+          ASSERT (mbsinit (&state));
+        }
+        return 0;
+
+      case '2':
+        /* Locale encoding is UTF-8.  */
+        {
+          char input[] = "s\303\274\303\237\360\237\230\213!"; /* "süß😋!" */
+          memset (&state, '\0', sizeof (mbstate_t));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == 's');
+          ASSERT (mbsinit (&state));
+          input[0] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 1, 1, &state);
+          ASSERT (ret == (size_t)(-2));
+          ASSERT (wc == (char32_t) 0xBADFACE);
+          ASSERT (!mbsinit (&state));
+          input[1] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 2, 7, &state);
+          ASSERT (ret == 1);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (wc == 0x00FC); /* expect Unicode encoding */
+          ASSERT (mbsinit (&state));
+          input[2] = '\0';
+
+          /* Test support of NULL first argument.  */
+          ret = mbrtoc32 (NULL, input + 3, 6, &state);
+          ASSERT (ret == 2);
+          ASSERT (mbsinit (&state));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 3, 6, &state);
+          ASSERT (ret == 2);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (wc == 0x00DF); /* expect Unicode encoding */
+          ASSERT (mbsinit (&state));
+          input[3] = '\0';
+          input[4] = '\0';
+
+          /* Test support of NULL first argument.  */
+          ret = mbrtoc32 (NULL, input + 5, 4, &state);
+          ASSERT (ret == 4);
+          ASSERT (mbsinit (&state));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 5, 4, &state);
+          ASSERT (ret == 4);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (wc == 0x1F60B); /* expect Unicode encoding */
+          ASSERT (mbsinit (&state));
+          input[5] = '\0';
+          input[6] = '\0';
+          input[7] = '\0';
+          input[8] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 9, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == '!');
+          ASSERT (mbsinit (&state));
+        }
+        return 0;
+
+      case '3':
+        /* Locale encoding is EUC-JP.  */
+        {
+          char input[] = "<\306\374\313\334\270\354>"; /* "<日本語>" */
+          memset (&state, '\0', sizeof (mbstate_t));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == '<');
+          ASSERT (mbsinit (&state));
+          input[0] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 1, 2, &state);
+          ASSERT (ret == 2);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (mbsinit (&state));
+          input[1] = '\0';
+          input[2] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 3, 1, &state);
+          ASSERT (ret == (size_t)(-2));
+          ASSERT (wc == (char32_t) 0xBADFACE);
+          ASSERT (!mbsinit (&state));
+          input[3] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 4, 4, &state);
+          ASSERT (ret == 1);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (mbsinit (&state));
+          input[4] = '\0';
+
+          /* Test support of NULL first argument.  */
+          ret = mbrtoc32 (NULL, input + 5, 3, &state);
+          ASSERT (ret == 2);
+          ASSERT (mbsinit (&state));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 5, 3, &state);
+          ASSERT (ret == 2);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (mbsinit (&state));
+          input[5] = '\0';
+          input[6] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 7, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == '>');
+          ASSERT (mbsinit (&state));
+        }
+        return 0;
+
+      case '4':
+        /* Locale encoding is GB18030.  */
+        {
+          char input[] = "s\250\271\201\060\211\070\224\071\375\067!"; /* "süß😋!" */
+          memset (&state, '\0', sizeof (mbstate_t));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == 's');
+          ASSERT (mbsinit (&state));
+          input[0] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 1, 1, &state);
+          ASSERT (ret == (size_t)(-2));
+          ASSERT (wc == (char32_t) 0xBADFACE);
+          ASSERT (!mbsinit (&state));
+          input[1] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 2, 9, &state);
+          ASSERT (ret == 1);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (mbsinit (&state));
+          input[2] = '\0';
+
+          /* Test support of NULL first argument.  */
+          ret = mbrtoc32 (NULL, input + 3, 8, &state);
+          ASSERT (ret == 4);
+          ASSERT (mbsinit (&state));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 3, 8, &state);
+          ASSERT (ret == 4);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (mbsinit (&state));
+          input[3] = '\0';
+          input[4] = '\0';
+          input[5] = '\0';
+          input[6] = '\0';
+
+          /* Test support of NULL first argument.  */
+          ret = mbrtoc32 (NULL, input + 7, 4, &state);
+          ASSERT (ret == 4);
+          ASSERT (mbsinit (&state));
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 7, 4, &state);
+          ASSERT (ret == 4);
+          ASSERT (c32tob (wc) == EOF);
+          ASSERT (mbsinit (&state));
+          input[7] = '\0';
+          input[8] = '\0';
+          input[9] = '\0';
+          input[10] = '\0';
+
+          wc = (char32_t) 0xBADFACE;
+          ret = mbrtoc32 (&wc, input + 11, 1, &state);
+          ASSERT (ret == 1);
+          ASSERT (wc == '!');
+          ASSERT (mbsinit (&state));
+        }
+        return 0;
+
+      case '5':
+        /* C locale; tested above.  */
+        return 0;
+      }
+
+  return 1;
+}
-- 
2.7.4

Reply via email to