On 04/27/2017 03:05 PM, Martin Sebor wrote:
On 04/26/2017 04:34 PM, Jakub Jelinek wrote:

Also, can't there be a way to shortcut all this processing if the
charsets are the same? And is it a good idea if every pass that needs to do
something with the exec charset chars caches its own results of the
langhook?

The biggest overhead is calling lang_hooks.to_target_charset
to initialize each character in the source set.  That could
be avoided if there were a way to determine in the middle-end
whether the input and target charsets are the same, but I don't
see one.  Alternatively, it could be optimized by converting
all the characters in one shot as a string, but without adding
a new target hook to do that I don't see how to do that either.
It might be a useful enhancement but given the scope it feels
like it should be done independently of this change.  But if
you know of a solution that escaped me please let me know.

The overhead of the additional processing should be negligible
irrespective of whether the charsets are different or the same
(all it entails is indexing into a table).
So the initialization could be done once per translation unit rather than once per function -- assuming the target character set doesn't change within a translation unit.

That seems like it ought to be easy.

The table-lookup seems like a reasonable cost and I don't think we should litter the code with conditionals to conditionally lookup based on whether or not the charsets are the same.


I agree that sharing data would be a good thing but as it is,
the little that can be be shared already is (the target_percent
character with builtins.c).  The rest of it (i.e., the mapping)
is only being used by gimple-ssa-sprintf.c.
If we ever get to a point where we need this level of mapping elsewhere, we can always pull the code out of gimple-ssa-sprintf into a more generic location. I don't think we need to over-engineering sharing into this right now.



If/when -Wformat is ever enhanced to handle -fexec-charset, or
if another area needs to, then implementinng some more general
would be worthwhile.
Right.


Attached is an updated patch with just the overflow handling
suggested by Joseph.

Martin


gcc-80523.diff


PR tree-optimization/80523 -  -Wformat-overflow doesn't consider -fexec-charset

gcc/ChangeLog:

        PR tree-optimization/80523
        * gimple-ssa-sprintf.c (target_to_host_charmap): New global variable.
        (init_target_to_host_charmap, target_to_host, target_strtol10): New
        functions.
        (maybe_warn, format_directive, parse_directive): Use new functions.
        (pass_sprintf_length::execute): Call init_target_to_host_charmap.       

gcc/testsuite/ChangeLog:

        PR tree-optimization/80523
        * gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: New test.

Index: gcc/gimple-ssa-sprintf.c
===================================================================
--- gcc/gimple-ssa-sprintf.c    (revision 247263)
+++ gcc/gimple-ssa-sprintf.c    (working copy)
+/* Return a string consisting of charavters in the host source character
s/charavters/characters/


+   set corresponding to the string TARGSTR consisting of characters in
+   the execution character set.  */
+
+static const char*
+target_to_host (const char *targstr)
+{
+  /* The interesting subset of source and execution characters are
+     the same so no conversion is necessary.  */
+  if (target_to_host_charmap['\0'] == 1)
+    return targstr;
+
+  /* Convert the initial substring of TARGSTR to the corresponding
+     characters in the host set, appending "..." if TARGSTR is too
+     long to fit.  Using the static buffer assumes the function is
+     not called in between sequence points (which it isn't).  */
+  static char hostr[32];
+  for (char *ph = hostr; ; ++targstr)
+    {
+      *ph++ = target_to_host (*targstr);
+      if (!*targstr)
+       break;
+
+      if (ph - hostr == sizeof hostr - 4)
+       {
+         *ph = '\0';
+         strcat (ph, "...");
+         break;
+       }
+    }
+
+  return hostr;
Ewww.  I guess the alternative would be something like:

Expand the return value to include a indicator of whether or not the original string was returned (common case) or if a new string was returned and thus needs to be deallocated by the caller.

That's probably pretty unpleasant given we don't have a central place where we call target_to_host, so the caller side would be much uglier.

Are there any downstream impacts when the string is too long to covert other than not warning for things which were unconverted?

Jeff


Reply via email to