> printf(“Hello World\n”); is UB under -fexec-charset= EBCDIC. WTF WTF!!!

It's not undefined behavior.  It does, however, appear to trip various
bugs in GCC.

$ cat test.c
#include <stdio.h>
int main(void) { printf("hello world\n"); }

$ gcc-9 --version | head -n1
gcc-9 (Debian 9.3.0-18) 9.3.0
$ gcc-9 -fexec-charset=EBCDIC-US test.c
during GIMPLE pass: printf-return-value
test.c: In function ‘main’:
test.c:2: internal compiler error: converting to execution character
set: Invalid or incomplete multibyte or wide character
    2 | int main(void) { printf("hello world\n"); }

$ gcc-10 --version | head -n1
gcc (Debian 10.2.0-18) 10.2.0

$ gcc-10 -fexec-charset=EBCDIC=US -O2 test.c
during GIMPLE pass: strlen
test.c: In function ‘main’:
test.c:2: internal compiler error: converting to execution character
set: Invalid or incomplete multibyte or wide character
    2 | int main(void) { printf("hello world\n"); }

But if you manage to avoid all the bugs, it works the way it's supposed to:

$ gcc-10 -fexec-charset=EBCDIC-US -O0 test.c
$ ./a.out | iconv -f EBCDIC-US -t UTF-8
hello world

"Internal compiler error" means "there is a bug in the compiler".  It
is not the same as "undefined behavior," which means something more
like "there is a bug in your code that the compiler is not obliged to
diagnose."

If this is not the problem you encountered, please describe in
excruciating detail what your problem actually was.

zw

p.s. I agree with you that the C "locale" mechanism and the C
standard's concept of "execution character set" are poorly designed
and one is usually better off writing code that avoids depending on
them.  But please understand that it's almost impossible to remove
_anything_ from the C standard, because the main thing C has going for
it anymore is backward compatibility all the way to the 1980s.  We
will not be dropping -fexec-charset as long as it's a feature of the C
standard.

Reply via email to