On 10/23/2017 06:14 PM, Martin Sebor wrote:
...
> It seems to me that before exposing a new mechanism to control
> the exec charset it would be prudent to a) plug at least the
> biggest holes to make the feature more reliable (in my mind,
> that's at least -Wformat), and b) make sure the pragma interacts
> correctly with existing features that work correctly with the
> -fexec-charset option. Where it doesn't and where it cannot
> be made to work correctly (i.e., is undefined), I would expect
> an effort to be made to detect and diagnose those undefined
> interactions if possible, or if that's too difficult, at
> a minimum document them.
Agreed. Getting a better understanding of the deficiencies with that mechanism
and at least document
them is an important step.
...
> My concern with this pragma/attribute and inlining has to do with
> strings in one exec charset being propagated into functions that
> operate on strings in another charset. E.g., like in the test
> case below that's "miscompiled" with your patch -- the first test
> for n == 7 is eliminated and the buffer overflow is not detected.
> If this cannot be made to work then I think some effort should be
> made to detect this mixing and matching and existing optimizations
> that assume the same charset (like the sprintf one does) disabled.
>
> static inline int
> f (char *d, const char *fmt)
> {
> #pragma GCC exec_charset ("utf8")
> int n = __builtin_sprintf (d, fmt, 12345);
> #pragma GCC exec_charset (pop)
>
> if (n == 7) // incorrectly optimized away
> __builtin_abort ();
To my understanding the problem in your example isn't triggered through
inlining.
The check gets optimized away because sprintf is called with "i=%i" being
converted to EBCDIC first.
The sprintf implementation does not recognize the format letter and hence only
prints the converted
format string without inserting the integer. That's kinda expected I would say.
You call a function
which is expected to deal with a UTF-8 string with an EBCDIC string. GCC is
reasoning about what the
sprintf invocation would return and removes the condition since it would never
be triggered.
Compiling with -O0 makes the program behave the same way even without the
inlining and the check not
being removed.
It would also happen without using the pragma but compiling with
-fexec-charset=EBCDIC-US instead.
E.g. compiling this program with -fexec-charset=EBCDIC-US does abort while it
does not abort without
using -fexec-charset:
int main (void)
{
char d[5];
int n = __builtin_sprintf (d, "i=%i", 12345);
if (n != 7)
__builtin_abort ();
}
I'm not sure what kind of warning to expect for such code? Would it be helpful
if the compiler tells
that the exec_charset setting does not affect "fmt" for this code?!
#pragma GCC exec_charset ("EBCDIC-US")
int n = __builtin_sprintf (d, fmt, 12345);
#pragma GCC exec_charset (pop)
On the other hand, diagnostics like this might create false positives. One
reason of course is that
this might be intentional. But it might also be because in C it is difficult to
detect whether a
char* variable will eventually be interpreted as string or not.
While I agree with you that the feature is not that easy to use and more
diagnostics would be good
I'm not really sure what to do about it.
Bye,
-Andreas-