Doesn't help. I tried 65001 (UTF-8): ### SET CP TO UTF-8, 65001 $cygwin_charset_test.ksh Old CP 65001 locale on entry LANG= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_ALL=
### CP SET TO 65001 Active code page: 65001 locale changed to LANG=en_US.CP1252 LC_CTYPE="en_US.CP1252" LC_NUMERIC="en_US.CP1252" LC_TIME="en_US.CP1252" LC_COLLATE="en_US.CP1252" LC_MONETARY="en_US.CP1252" LC_MESSAGES="en_US.CP1252" LC_ALL=en_US.CP1252 Running WIN32 pgm Transcoding using Cygwin codepage: 1252 Input widechar string: lpw[0] = Z - 5A lpw[1] = - F0C7 wmain: Z? Active code page: 65001 and 1252 ### SET CP TO 1252 $cygwin_charset_test.ksh Old CP 65001 locale on entry LANG= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_ALL= ### CP SET TO 1252 Active code page: 1252 locale changed to LANG=en_US.CP1252 LC_CTYPE="en_US.CP1252" LC_NUMERIC="en_US.CP1252" LC_TIME="en_US.CP1252" LC_COLLATE="en_US.CP1252" LC_MONETARY="en_US.CP1252" LC_MESSAGES="en_US.CP1252" LC_ALL=en_US.CP1252 Running WIN32 pgm Transcoding using Cygwin codepage: 1252 Input widechar string: lpw[0] = Z - 5A lpw[1] = - F0C7 wmain: Z? Active code page: 65001 Michael From: "Brian Inglis" <brian.ing...@systematicsw.ab.ca> To: cygwin@cygwin.com Date: 08/03/2020 12:31 PM Subject: Re: Trouble with character sets Sent by: "Cygwin" <cygwin-boun...@cygwin.com> On 2020-08-03 09:36, Michael Shay via Cygwin wrote: > I'm having a problem with Cygwin 3.1.4, changing the character set on the > fly. It seems to work with Cygwin applications, but not with Win32 > applications. > I have a Korn shell script: > #!/bin/ksh > OLD_LANG="$LANG" > OLD_LC_ALL="$LC_ALL" > echo "locale on entry" > locale > echo "" > export LANG="en_US.CP1252" > export LC_ALL=en_US.CP1252 > echo "locale changed to" > locale > echo "" > # Default is to run the Win32 program. Input any argument other than > 'WIN32' > # to run '/bin/echo'. > case $# in > 0 ) echo "Running WIN32 pgm" > ksh -c 'cygtest.exe ZÇ' > ;; > 1 ) echo "Running Cygwin 'echo'" > ksh -c '/bin/echo ZÇ' > ;; > 2 ) echo "Running WIN32 pgm" > ksh -c 'cygtest.exe ZÇ' > echo "" > echo "Running Cygwin 'echo'" > ksh -c '/bin/echo ZÇ' > ;; > * ) ;; > esac > LC_ALL="$OLD_LC_ALL" > LANG="$OLD_LANG" > and a Win32 application (attached file cygtest.cpp) > I used gdb to see what was happening in child_info_spawn::worker(), when a > Win32 program is started using: > rc = CreateProcessW (runpath, /* image name w/ full path */ > cmd.wcs (wcmd), /* what was passed to exec */ > sa, /* process security attrs */ > sa, /* thread security attrs */ > TRUE, /* inherit handles */ > c_flags, > envblock, /* environment */ > NULL, > &si, > &pi); > Specifically, 'cmd.wcs(wcmd)' invokes: > wchar_t *wcs (wchar_t *wbuf, size_t n) > { > if (n == 1) > wbuf[0] = L'\0'; > else > sys_mbstowcs (wbuf, n, buf); > return wbuf; > } > and sys_mbstowcs(): > size_t __reg3 > sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms) > { > mbtowc_p f_mbtowc = __MBTOWC; > if (f_mbtowc == __ascii_mbtowc) > { > f_mbtowc = __utf8_mbtowc; <<<<< this > is ALWAYS done, no matter what charset is in use. > } > return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms); > } > Since the CP1252 is an 8-bit single-byte character set with characters >= > 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the > '0xf0' byte indicating an invalid character in the string. > This doesn't seem to happen when e.g. '/bin/echo' is run, although I > haven't stepped into the code to see what's happening. > I do not think this is a Cygwin bug, but since the User's Guide says the > locale and charset can be changed on the fly, I don't know what's going > awry. > Any suggestions? If you need more information, I'm happy to provide it. Try: $ chcp.com Active code page: 850 $ chcp.com 65001 Active code page: 65001 $ chcp.com Active code page: 65001 -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in IEC units and prefixes, physical quantities in SI.] -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple NOTICE from Ab Initio: This email (including any attachments) may contain information that is subject to confidentiality obligations or is legally privileged, and sender does not waive confidentiality or privilege. If received in error, please notify the sender, delete this email, and make no further use, disclosure, or distribution. -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple