The stdio output functions have two bugs when it comes to output
to a Windows console.

Windows consoles come with two encodings: GetACP() and GetOEMCP(). For
Japanese, both have the same value (932). However, for English, German,
French Windows installations, GETACP() = 1252 and GetOEMCP() = 850.
For many years, output of non-ASCII characters to consoles was a PITA:
While the program had to produce output in GetACP() encoding when
writing to files, it had to produce output in GetOEMCP() encoding when
writing to a console. The majority of programs did not do this: they
produced output in GetACP() encoding always, and thus non-ASCII
characters got garbled in consoles.

After many many years, Microsoft finally added a workaround in the
C runtime library (msvcrt and ucrt). When a program writes a string
to a console, the runtime library tests whether the output goes
to a console, and if yes, it does a conversion from GetACP() encoding
to GetOEMCP() encoding on the fly, in two steps: from GetACP() to UTF-16
via MultiByteToWideChar, then to GetOEMCP() via WideCharToMultiByte.

This workaround works fine in ucrt. But in msvcrt this workaround
has two bugs. Both happen when
  - The output goes to a console. (No bug when the output goes to a file.)
and
  - The stream's mode is _O_TEXT. (Which is the default for stdout
    and stderr. No bug when the stream's mode is _O_BINARY.)
and
  - setlocale() is called before. (No bug if setlocale() is not called,
    that is, when the locale remains the "C" locale.)
and
  - The chosen locale has a double-byte encoding, such as CP932.
    (No bug for unibyte locale encodings, such as CP1252.)
and
  - The console's codepage matches the locale's encoding. For
    example, after 'chcp 932' was executed.

Bug 1:

When the application outputs double-byte characters one byte at
a time, using the functions fputc() or putc(), the console shows JISX0201
(ASCII and Katakana) characters instead of CP932 (ASCII, Katakana,
Hiragana, Hanzi) characters.

How to reproduce:
1. Use Windows 10 or 11. Switch it to Japanese as main language.
2. Use the attached program. In the dev environment:
  $ gcc -Wall foo.c
3. In a cmd.exe console:
  $ chcp 932
  $ .\a
Look at the output of the parts C and D.

Bug 2:

When the application outputs a string, that starts with a non-ASCII
character, using the function fwrite(), the console shows no output,
and the stream's error indicator gets set.

How to reproduce:
1. Use Windows 10 or 11. Switch it to Japanese as main language.
2. Use the attached program. In the dev environment:
  $ gcc -Wall foo.c
3. In a cmd.exe console:
  $ chcp 932
  $ .\a
Look at the output of the parts E and F.

I don't plan to add workarounds for these bugs to Gnulib, because
* Normal applications don't write strings one byte at a time, for
  speed.
* Normal applications use fwrite() for binary I/O and fputs() or
  [v][f]printf or similar for text I/O.

If anyone wants these bugs fixed, they will have to build their
application against ucrt instead of msvcrt. The MSYS2 project
contains tools and libraries for mingw+ucrt. (Btw, building with
ucrt instead of msvcrt also has the benefit of supporting the
UTF-8 locales of Windows. [1][2])

[1] 
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale
    "Starting in Windows 10 version 1803 (10.0.17134.0), the Universal C
     Runtime supports using a UTF-8 code page."
[2] https://lists.gnu.org/archive/html/bug-gnulib/2024-12/msg00159.html


2025-09-16  Bruno Haible  <[email protected]>

        Document msvcrt (native Windows) bugs regarding console output.
        * doc/posix-functions/fputc.texi: Document a bug found in msvcrt.
        * doc/posix-functions/putc.texi: Likewise.
        * doc/posix-functions/fwrite.texi: Document another bug found in msvcrt.

#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <fcntl.h>

/* These rpl_* functions disable possible gcc optimizations.  */

int
rpl_printf (const char *format, ...)
{
  int retval;
  va_list args;

  va_start (args, format);
  retval = vfprintf (stdout, format, args);
  va_end (args);

  return retval;
}

int
rpl_fprintf (FILE *stream, const char *format, ...)
{
  int retval;
  va_list args;

  va_start (args, format);
  retval = vfprintf (stream, format, args);
  va_end (args);

  return retval;
}

int
rpl_vprintf (const char *format, va_list args)
{
  return vfprintf (stdout, format, args);
}

int
rpl_vfprintf (FILE *stream, const char *format, va_list args)
{
  return vfprintf (stream, format, args);
}

int
rpl_putchar (int c)
{
  return fputc (c, stdout);
}

int
rpl_fputc (int c, FILE *stream)
{
  return fputc (c, stream);
}

int
rpl_fputs (const char *string, FILE *stream)
{
  return fputs (string, stream);
}

int
rpl_puts (const char *string)
{
  return puts (string);
}

size_t
rpl_fwrite (const void *ptr, size_t s, size_t n, FILE *stream)
{
  return fwrite (ptr, s, n, stream);
}

int
main (int argc, char *argv[])
{
  // When the output is redirected to a file, all outputs are correct.

  // When the program is compiled with ucrt (as opposed to msvcrt),
  // all outputs are correct.

  // When the mode is set to _O_BINARY, all outputs are correct.
  //_setmode (1, _O_BINARY);

  const char *text;

  if (1)
    {
      // When setlocale is not called, all outputs are correct.
      setlocale (LC_ALL, "Japanese_Japan.932");

      text = "\203\111\203\166\203\126\203\207\203\223\202\306\210\370\220\224\072\n";
    }
  else
    // When a single-byte locale (e.g. CP1252) is used, all outputs are correct.
    {
      setlocale (LC_ALL, "German_Germany.1252");

      text = "B\366se B\374bchen tun Bu\337e\n";
    }

  // __USE_MINGW_ANSI_STDIO=0  __USE_MINGW_ANSI_STDIO=1

  puts ("A");
  clearerr (stdout);
  //     correct                   correct
  fputs (text, stdout);
  fflush (stdout);
  if (ferror (stdout)) puts ("ferror() -> true!");

  puts ("B");
  clearerr (stdout);
  //     correct                   correct
  rpl_fputs (text, stdout);
  fflush (stdout);
  if (ferror (stdout)) puts ("ferror() -> true!");

  puts ("C");
  clearerr (stdout);
  //     garbled                   garbled
  for (const char *s = text; *s; s++)
    fputc (*s, stdout);
  fflush (stdout);
  if (ferror (stdout)) puts ("ferror() -> true!");

  puts ("D");
  clearerr (stdout);
  //     garbled                   garbled
  for (const char *s = text; *s; s++)
    rpl_fputc (*s, stdout);
  fflush (stdout);
  if (ferror (stdout)) puts ("ferror() -> true!");

  puts ("E");
  clearerr (stdout);
  //     no output                 no output
  fwrite (text, 1, strlen (text), stdout);
  fflush (stdout);
  if (ferror (stdout)) puts ("ferror() -> true!");

  puts ("F");
  clearerr (stdout);
  //     no output                 no output
  rpl_fwrite (text, 1, strlen (text), stdout);
  fflush (stdout);
  if (ferror (stdout)) puts ("ferror() -> true!");

  puts ("G");
  clearerr (stdout);
  //     correct                   garbled
  fprintf (stdout, "%s%s%s", "", text, "");
  fflush (stdout);
  if (ferror (stdout)) puts ("ferror() -> true!");

  puts ("H");
  clearerr (stdout);
  //     correct                   garbled
  rpl_fprintf (stdout, "%s%s%s", "", text, "");
  fflush (stdout);
  if (ferror (stdout)) puts ("ferror() -> true!");

  // Whenever the result is "garbled" or "no output",
  // the stream's error indicator is set, i.e. ferror() returns true.

  exit (EXIT_SUCCESS);
}

>From 901563ae363e4816b9b7ecdb154910e18b6052ca Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Tue, 16 Sep 2025 17:08:44 +0200
Subject: [PATCH] Document msvcrt (native Windows) bugs regarding console
 output.

* doc/posix-functions/fputc.texi: Document a bug found in msvcrt.
* doc/posix-functions/putc.texi: Likewise.
* doc/posix-functions/fwrite.texi: Document another bug found in msvcrt.
---
 ChangeLog                       | 7 +++++++
 doc/posix-functions/fputc.texi  | 6 ++++++
 doc/posix-functions/fwrite.texi | 6 ++++++
 doc/posix-functions/putc.texi   | 6 ++++++
 4 files changed, 25 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 73b7ff269c..d6a4df6cd0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2025-09-16  Bruno Haible  <[email protected]>
+
+	Document msvcrt (native Windows) bugs regarding console output.
+	* doc/posix-functions/fputc.texi: Document a bug found in msvcrt.
+	* doc/posix-functions/putc.texi: Likewise.
+	* doc/posix-functions/fwrite.texi: Document another bug found in msvcrt.
+
 2025-09-16  Bruno Haible  <[email protected]>
 
 	strerror_r: Ensure a trailing NUL when truncating.
diff --git a/doc/posix-functions/fputc.texi b/doc/posix-functions/fputc.texi
index de80da596c..892243ae87 100644
--- a/doc/posix-functions/fputc.texi
+++ b/doc/posix-functions/fputc.texi
@@ -32,6 +32,12 @@
 On Windows platforms (excluding Cygwin), this function does not set @code{errno}
 upon failure.
 @item
+This function fails and produces garbled output
+when invoked twice, for outputting a non-ASCII character in double-byte encoding,
+corresponding to the locale, on some platforms:
+mingw in combination with msvcrt,
+when the output goes to a Windows console.
+@item
 On some platforms, this function does not set @code{errno} or the
 stream error indicator on attempts to write to a read-only stream:
 Cygwin 1.7.9.
diff --git a/doc/posix-functions/fwrite.texi b/doc/posix-functions/fwrite.texi
index 5cd99e5940..3922a9f1db 100644
--- a/doc/posix-functions/fwrite.texi
+++ b/doc/posix-functions/fwrite.texi
@@ -32,6 +32,12 @@
 On Windows platforms (excluding Cygwin), this function does not set @code{errno}
 upon failure.
 @item
+This function fails and produces no output
+when the argument string starts with a non-ASCII character in double-byte encoding,
+corresponding to the locale, on some platforms:
+mingw in combination with msvcrt,
+when the output goes to a Windows console.
+@item
 On some platforms, this function does not set @code{errno} or the
 stream error indicator on attempts to write to a read-only stream:
 Cygwin 1.7.9.
diff --git a/doc/posix-functions/putc.texi b/doc/posix-functions/putc.texi
index aec5b7e7d8..e956601631 100644
--- a/doc/posix-functions/putc.texi
+++ b/doc/posix-functions/putc.texi
@@ -32,6 +32,12 @@
 On Windows platforms (excluding Cygwin), this function does not set @code{errno}
 upon failure.
 @item
+This function fails and produces garbled output
+when invoked twice, for outputting a non-ASCII character in double-byte encoding,
+corresponding to the locale, on some platforms:
+mingw in combination with msvcrt,
+when the output goes to a Windows console.
+@item
 On some platforms, this function does not set @code{errno} or the
 stream error indicator on attempts to write to a read-only stream:
 Cygwin 1.7.9.
-- 
2.50.1

Reply via email to