On 04/27/2017 03:05 PM, Martin Sebor wrote:
On 04/26/2017 04:34 PM, Jakub Jelinek wrote:
Also, can't there be a way to shortcut all this processing if the
charsets are the same? And is it a good idea if every pass that needs
to do
something with the exec charset chars caches its own results of the
langhook?
The biggest overhead is calling lang_hooks.to_target_charset
to initialize each character in the source set. That could
be avoided if there were a way to determine in the middle-end
whether the input and target charsets are the same, but I don't
see one. Alternatively, it could be optimized by converting
all the characters in one shot as a string, but without adding
a new target hook to do that I don't see how to do that either.
It might be a useful enhancement but given the scope it feels
like it should be done independently of this change. But if
you know of a solution that escaped me please let me know.
The overhead of the additional processing should be negligible
irrespective of whether the charsets are different or the same
(all it entails is indexing into a table).
So the initialization could be done once per translation unit rather
than once per function -- assuming the target character set doesn't
change within a translation unit.
That seems like it ought to be easy.
The table-lookup seems like a reasonable cost and I don't think we
should litter the code with conditionals to conditionally lookup based
on whether or not the charsets are the same.
I agree that sharing data would be a good thing but as it is,
the little that can be be shared already is (the target_percent
character with builtins.c). The rest of it (i.e., the mapping)
is only being used by gimple-ssa-sprintf.c.
If we ever get to a point where we need this level of mapping elsewhere,
we can always pull the code out of gimple-ssa-sprintf into a more
generic location. I don't think we need to over-engineering sharing
into this right now.
If/when -Wformat is ever enhanced to handle -fexec-charset, or
if another area needs to, then implementinng some more general
would be worthwhile.
Right.
Attached is an updated patch with just the overflow handling
suggested by Joseph.
Martin
gcc-80523.diff
PR tree-optimization/80523 - -Wformat-overflow doesn't consider -fexec-charset
gcc/ChangeLog:
PR tree-optimization/80523
* gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
(init_target_to_host_charmap, target_to_host, target_strtol10): New
functions.
(maybe_warn, format_directive, parse_directive): Use new functions.
(pass_sprintf_length::execute): Call init_target_to_host_charmap.
gcc/testsuite/ChangeLog:
PR tree-optimization/80523
* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.
Index: gcc/gimple-ssa-sprintf.c
===================================================================
--- gcc/gimple-ssa-sprintf.c (revision 247263)
+++ gcc/gimple-ssa-sprintf.c (working copy)
+/* Return a string consisting of charavters in the host source character
s/charavters/characters/
+ set corresponding to the string TARGSTR consisting of characters in
+ the execution character set. */
+
+static const char*
+target_to_host (const char *targstr)
+{
+ /* The interesting subset of source and execution characters are
+ the same so no conversion is necessary. */
+ if (target_to_host_charmap['\0'] == 1)
+ return targstr;
+
+ /* Convert the initial substring of TARGSTR to the corresponding
+ characters in the host set, appending "..." if TARGSTR is too
+ long to fit. Using the static buffer assumes the function is
+ not called in between sequence points (which it isn't). */
+ static char hostr[32];
+ for (char *ph = hostr; ; ++targstr)
+ {
+ *ph++ = target_to_host (*targstr);
+ if (!*targstr)
+ break;
+
+ if (ph - hostr == sizeof hostr - 4)
+ {
+ *ph = '\0';
+ strcat (ph, "...");
+ break;
+ }
+ }
+
+ return hostr;
Ewww. I guess the alternative would be something like:
Expand the return value to include a indicator of whether or not the
original string was returned (common case) or if a new string was
returned and thus needs to be deallocated by the caller.
That's probably pretty unpleasant given we don't have a central place
where we call target_to_host, so the caller side would be much uglier.
Are there any downstream impacts when the string is too long to covert
other than not warning for things which were unconverted?
Jeff