On February 25, 2019 1:44:07 PM GMT+02:00, Jeff Conrad <jeff_con...@msn.com> wrote: > Monday, February 25, 2019 2:35 AM, Eli Zaretskii wrote: > > > > Running something like > > > > > > groff -Tutf8 <file> > > > > > > rather than something like > > > > > > groff -Tutf8 <file> | more > > > > > > or > > > > > > groff -Tutf8 <file> > <outfile> > > > > > > Jeff > > > > Yes, I tried all of the above. The last method ends up with correct > UTF-8 > > sequences, all the others yield mojibake. > > Since method 2 works for me, I guess I’m having better luck than you—I > suppose I should count my blessings :-). Especially since method 2 is > the one I would most often use. > > > Groff, of course, writes the same bytes in all mrthods. > > As is does for me, confirmed by ‘od -h’. > > The question, then, is why grotty is behaving differently than my > simple > C program, which—as nearly as I can tell—is doing the same thing when > outputting characters. Win 10 vs. Win 7? Compiler? Or perhaps I > missed something important elsewhere in the code for tty.cpp. > > Anyway, stuff like this should make it clear why someone running > Windows > would do something as silly as create a devcp1252. > > Jeff
You are on Windows 10, which probably explains everything. The only explanation I could come up with regarding your simple program is that VS linked it against static libraries, or maybe special versions of dynamic libraries, which implement fputs etc. in a way that works better with Windows 10 console. By contrast, Groff you find on ezwinports links dynamically to MSVCRT.DLL. I stepped with a debugger through tty_printer:put_char and verified that it gets the same Unicode codepoints and produces the same UTF-8 sequences as your test program. So the explanation must be outside Groff. In any case, the conclusion remains that UTF-8 console output on Windows is unreliable, perhaps apart of Windows 10. Which isn't surprising, given that variable-length multibyte encodings are second-class citizens on Windows, as documented by MSDN.