On 10/23/2017 04:55 AM, Andreas Krebbel wrote:
On 10/19/2017 07:13 PM, Martin Sebor wrote:
On 10/19/2017 09:50 AM, Andreas Krebbel wrote:
The TPF operating system uses the GCC S/390 backend. They set an
EBCDIC exec charset for compilation using -fexec-charset. However,
certain libraries require ASCII strings instead. In order to be able
to put calls to that library into the normal code it is required to
switch the exec charset within a compilation unit.
This is an attempt to implement it by adding a new pragma which could
be used like in the following example:
int
foo ()
{
call_with_utf8("hello world");
#pragma GCC exec_charset("UTF16")
call_with_utf16("hello world");
#pragma GCC exec_charset(pop)
call_with_utf8("hello world");
}
Does this look reasonable?
I'm not an expert on this but at a high level it looks reasonable
to me. But based on some small amount of work I did in this area
I have a couple of questions.
There are a few places in the compiler that already do or that
should but don't yet handle different execution character sets.
The former include built-ins like __bultin_isdigit() and
__builtin_sprintf (in both builtins.c and gimple-ssa-sprintf.c)
The latter is the -Wformat checking done by the C and C++ front
ends. The missing support for the latter is the subject of bug
38308. According to bug 81686, LTO is apparently also missing
support for exec-charset.
These probably are the areas Richard and Jakub were referring to as well?!
These cases did not work
properly with the -fexec-charset cmdline option and this does not change with
the pragma. I'll try
to look at what has been proposed in the discussion. Perhaps I can get it
working somehow.
Right, the patch doesn't remove the known deficiencies. But
by providing another knob to control the execution charset, at
a fine grain level, it encourages users to make greater use of
the (incomplete) exec-charset support and increases the odds
that they will run afoul of them.
It seems to me that before exposing a new mechanism to control
the exec charset it would be prudent to a) plug at least the
biggest holes to make the feature more reliable (in my mind,
that's at least -Wformat), and b) make sure the pragma interacts
correctly with existing features that work correctly with the
-fexec-charset option. Where it doesn't and where it cannot
be made to work correctly (i.e., is undefined), I would expect
an effort to be made to detect and diagnose those undefined
interactions if possible, or if that's too difficult, at
a minimum document them.
I'm curious how the pragma might interact with these two areas,
and whether the lack of support for it in the latter is a concern
(and if not, why not). For the former, I'm also wondering about
the interaction of inlining and other interprocedural optimizations
with the pragma. Does it propagate through inlined calls as one
would expect?
The pragma does not apply to the callees of a function defined under the pragma
regardless of
whether it gets inlined or not. That matches the behavior of other pragmas.
If it would apply to
inlined callees the program semantics might change depending on optimization
decisions i.e. whether
a certain call got inlined or not.
Callees marked as always_inline might be discussed separately. I remember this
being a topic when
looking at function attributes.
My concern with this pragma/attribute and inlining has to do with
strings in one exec charset being propagated into functions that
operate on strings in another charset. E.g., like in the test
case below that's "miscompiled" with your patch -- the first test
for n == 7 is eliminated and the buffer overflow is not detected.
If this cannot be made to work then I think some effort should be
made to detect this mixing and matching and existing optimizations
that assume the same charset (like the sprintf one does) disabled.
static inline int
f (char *d, const char *fmt)
{
#pragma GCC exec_charset ("utf8")
int n = __builtin_sprintf (d, fmt, 12345);
#pragma GCC exec_charset (pop)
if (n == 7) // incorrectly optimized away
__builtin_abort ();
return n;
}
int main (void)
{
char d[5];
#pragma GCC exec_charset ("EBCDIC-US")
int n = f (d, "i=%i"); // buffer overflow not detected
#pragma GCC exec_charset (pop)
__builtin_printf ("%i (%lu): %s\n", n, __builtin_strlen (d), d);
if (n != 7) // aborts at runtime
__builtin_abort ();
}
Martin