On 6/18/19 12:59 PM, Jeff Law wrote:
On 5/22/19 10:42 AM, Martin Sebor wrote:
Attached is a revised implementation of the -Wformat-diag checker
incorporating the feedback I got on the first revision.

Martin

gcc-wformat-diag-checker.diff

gcc/c-family/ChangeLog:

        * c-format.c (function_format_info::format_type): Adjust type.
        (function_format_info::is_raw): New member.
        (decode_format_type): Adjust signature.  Handle "raw" diag attributes.
        (decode_format_attr): Adjust call to decode_format_type.
        Avoid a redundant call to convert_format_name_to_system_name.
        Avoid abbreviating the word "arguments" in a diagnostic.
        (format_warning_substr): New function.
        (avoid_dollar_number): Quote dollar sign in a diagnostic.
        (finish_dollar_format_checking): Same.
        (check_format_info): Same.
        (struct baltoks_t): New.
        (c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
        (maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
        functions.
        (check_format_info_main): Call check_plain.  Use baltoks_t.  Call
        maybe_diag_unbalanced_tokens.
        (handle_format_attribute): Spell out the word "arguments" in
        a diagnostic.

gcc/testsuite/ChangeLog:
        * gcc.dg/format/gcc_diag-11.c: New test.
High level comment.  This is painful to read.  But that's probably an
artifact of trying to cobble together C code to parse strings and codify
the conventions.    ie, it's likely inherent for the problem you're
trying to solve.

It wasn't exactly a lot of fun to write either.  I suspect it
would have been even worse if I had used regular expressions.
It is more complicated than strictly necessary because it's
trying to balance usability of the warning with efficiency.
(Although I'm sure both could be improved.)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index a7f76c1c01d..713fce442c9 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree atname, tree 
args,
      {
        const char *p = IDENTIFIER_POINTER (format_type_id);
- p = convert_format_name_to_system_name (p);
+      info->format_type = decode_format_type (p, &info->is_raw);
- info->format_type = decode_format_type (p);
-
        if (!c_dialect_objc ()
           && info->format_type == gcc_objc_string_format_type)
        {
Did you mean to drop the call to convert_format_name_to_system_name?

Yes, it's redundant (it's the first thing decode_format_type does).

@@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info *fci,
    return true;
  }
+
+/* Helper for initializing global token_t arrays below.  */
+#define NAME(name) { name, sizeof name - 1, NULL }
+
+/* C/C++ operators that are expected to be quoted within the format
+   string.  */
+
+static const token_t c_opers[] =
+  {
+   NAME ("!="), NAME ("%="),  NAME ("&&"),  NAME ("&="), NAME ("*="),
+   NAME ("++"), NAME ("+="),  NAME ("--"),  NAME ("-="), NAME ("->"),
+   NAME ("/="), NAME ("<<"),  NAME ("<<="), NAME ("<="), NAME ("=="),
+   NAME (">="), NAME (">>="), NAME (">>"),  NAME ("?:"),  NAME ("^="),
+   NAME ("|="), NAME ("||")
+  };
This clearly isn't a full list of operators.  Is there a particular
reason why we aren't diagnosing others.  I guess you're catching the
single character operators via the ISPUNCT checks?  That seems a little
loose (@ isn't an operator for example).  It  may be OK in practice though.

Yes, it only handles two-character operators and its only purpose
is to make diagnostics more readable and less repetitive (otherwise
we'd get one for each occurrence of the punctuator). I think @ is
an operator in Objective-C (I only know this because I fixed a few
instances of it there).

+
+  if (nchars)
+    {
+      /* This is the most common problem: go the extra mile to decribe
s/decribe/describe/



+
+static void
+maybe_diag_unbalanced_tokens (location_t format_string_loc,
+                             const char *orig_format_chars,
+                             tree format_string_cst,
+                             baltoks_t &baltoks)
Needs a function comment.


@@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results *res,
init_dollar_format_checking (info->first_arg_num, first_fillin_param); + /* In GCC diagnostic functions check plain directives (substrings within
+     the format string that don't start with %) for quoting and punctuations
+     problems.  */
+  bool ck_plain = (!info->is_raw
+                  && (info->format_type == gcc_diag_format_type
+                      || info->format_type == gcc_tdiag_format_type
+                      || info->format_type == gcc_cdiag_format_type
+                      || info->format_type == gcc_cxxdiag_format_type));
+
    while (*format_chars != 0)
      {
-      if (*format_chars++ != '%')
+      if (ck_plain)
+       format_chars = check_plain (format_string_loc,
+                                   format_string_cst,
+                                   orig_format_chars, format_chars,
+                                   baltoks);
+
+      if (*format_chars == 0 || *format_chars++ != '%')
        continue;
+
        if (*format_chars == 0)
Isn't the second test of *format_chars == 0 dead now?

It's there in case ck_plain is false.

I'm going to throw this into my tester and see what, if anything, pops
out while you fixup the nits above.  Assuming there isn't anything
major, my inclination is to go forward.  We may over time improve the
rules around diagnostics which will require iteration here, but this is
clearly a step forward.

The warning is prevented from causing errors for now so with
the exception of bugs (ICEs or the suppression not working) it
shouldn't cause any noticeable fallout.  What I would expect
is -Wformat-diag instances in build logs pointing out all
the remaining violations that are yet to be fixed.  Especially
in back-end code for targets other than x86_64.

Martin

Reply via email to