On 6/18/19 1:21 PM, Martin Sebor wrote:
> On 6/18/19 12:59 PM, Jeff Law wrote:
>> On 5/22/19 10:42 AM, Martin Sebor wrote:
>>> Attached is a revised implementation of the -Wformat-diag checker
>>> incorporating the feedback I got on the first revision.
>>>
>>> Martin
>>>
>>> gcc-wformat-diag-checker.diff
>>>
>>> gcc/c-family/ChangeLog:
>>>
>>>     * c-format.c (function_format_info::format_type): Adjust type.
>>>     (function_format_info::is_raw): New member.
>>>     (decode_format_type): Adjust signature.  Handle "raw" diag
>>> attributes.
>>>     (decode_format_attr): Adjust call to decode_format_type.
>>>     Avoid a redundant call to convert_format_name_to_system_name.
>>>     Avoid abbreviating the word "arguments" in a diagnostic.
>>>     (format_warning_substr): New function.
>>>     (avoid_dollar_number): Quote dollar sign in a diagnostic.
>>>     (finish_dollar_format_checking): Same.
>>>     (check_format_info): Same.
>>>     (struct baltoks_t): New.
>>>     (c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
>>>     (maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
>>>     functions.
>>>     (check_format_info_main): Call check_plain.  Use baltoks_t.  Call
>>>     maybe_diag_unbalanced_tokens.
>>>     (handle_format_attribute): Spell out the word "arguments" in
>>>     a diagnostic.
>>>
>>> gcc/testsuite/ChangeLog:
>>>     * gcc.dg/format/gcc_diag-11.c: New test.
>> High level comment.  This is painful to read.  But that's probably an
>> artifact of trying to cobble together C code to parse strings and codify
>> the conventions.    ie, it's likely inherent for the problem you're
>> trying to solve.
> 
> It wasn't exactly a lot of fun to write either.  I suspect it
> would have been even worse if I had used regular expressions.
> It is more complicated than strictly necessary because it's
> trying to balance usability of the warning with efficiency.
> (Although I'm sure both could be improved.)
> 
>>> diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
>>> index a7f76c1c01d..713fce442c9 100644
>>> --- a/gcc/c-family/c-format.c
>>> +++ b/gcc/c-family/c-format.c
>>> @@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree
>>> atname, tree args,
>>>       {
>>>         const char *p = IDENTIFIER_POINTER (format_type_id);
>>>   -      p = convert_format_name_to_system_name (p);
>>> +      info->format_type = decode_format_type (p, &info->is_raw);
>>>   -      info->format_type = decode_format_type (p);
>>> -
>>>         if (!c_dialect_objc ()
>>>          && info->format_type == gcc_objc_string_format_type)
>>>       {
>> Did you mean to drop the call to convert_format_name_to_system_name?
> 
> Yes, it's redundant (it's the first thing decode_format_type does).
> 
>>> @@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info
>>> *fci,
>>>     return true;
>>>   }
>>>   +
>>> +/* Helper for initializing global token_t arrays below.  */
>>> +#define NAME(name) { name, sizeof name - 1, NULL }
>>> +
>>> +/* C/C++ operators that are expected to be quoted within the format
>>> +   string.  */
>>> +
>>> +static const token_t c_opers[] =
>>> +  {
>>> +   NAME ("!="), NAME ("%="),  NAME ("&&"),  NAME ("&="), NAME ("*="),
>>> +   NAME ("++"), NAME ("+="),  NAME ("--"),  NAME ("-="), NAME ("->"),
>>> +   NAME ("/="), NAME ("<<"),  NAME ("<<="), NAME ("<="), NAME ("=="),
>>> +   NAME (">="), NAME (">>="), NAME (">>"),  NAME ("?:"),  NAME ("^="),
>>> +   NAME ("|="), NAME ("||")
>>> +  };
>> This clearly isn't a full list of operators.  Is there a particular
>> reason why we aren't diagnosing others.  I guess you're catching the
>> single character operators via the ISPUNCT checks?  That seems a little
>> loose (@ isn't an operator for example).  It  may be OK in practice
>> though.
> 
> Yes, it only handles two-character operators and its only purpose
> is to make diagnostics more readable and less repetitive (otherwise
> we'd get one for each occurrence of the punctuator). I think @ is
> an operator in Objective-C (I only know this because I fixed a few
> instances of it there).
> 
>>> +
>>> +  if (nchars)
>>> +    {
>>> +      /* This is the most common problem: go the extra mile to decribe
>> s/decribe/describe/
>>
>>
>>
>>> +
>>> +static void
>>> +maybe_diag_unbalanced_tokens (location_t format_string_loc,
>>> +                  const char *orig_format_chars,
>>> +                  tree format_string_cst,
>>> +                  baltoks_t &baltoks)
>> Needs a function comment.
>>
>>
>>> @@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results
>>> *res,
>>>       init_dollar_format_checking (info->first_arg_num,
>>> first_fillin_param);
>>>   +  /* In GCC diagnostic functions check plain directives
>>> (substrings within
>>> +     the format string that don't start with %) for quoting and
>>> punctuations
>>> +     problems.  */
>>> +  bool ck_plain = (!info->is_raw
>>> +           && (info->format_type == gcc_diag_format_type
>>> +               || info->format_type == gcc_tdiag_format_type
>>> +               || info->format_type == gcc_cdiag_format_type
>>> +               || info->format_type == gcc_cxxdiag_format_type));
>>> +
>>>     while (*format_chars != 0)
>>>       {
>>> -      if (*format_chars++ != '%')
>>> +      if (ck_plain)
>>> +    format_chars = check_plain (format_string_loc,
>>> +                    format_string_cst,
>>> +                    orig_format_chars, format_chars,
>>> +                    baltoks);
>>> +
>>> +      if (*format_chars == 0 || *format_chars++ != '%')
>>>       continue;
>>> +
>>>         if (*format_chars == 0)
>> Isn't the second test of *format_chars == 0 dead now?
> 
> It's there in case ck_plain is false.
> 
>> I'm going to throw this into my tester and see what, if anything, pops
>> out while you fixup the nits above.  Assuming there isn't anything
>> major, my inclination is to go forward.  We may over time improve the
>> rules around diagnostics which will require iteration here, but this is
>> clearly a step forward.
> 
> The warning is prevented from causing errors for now so with
> the exception of bugs (ICEs or the suppression not working) it
> shouldn't cause any noticeable fallout.  What I would expect
> is -Wformat-diag instances in build logs pointing out all
> the remaining violations that are yet to be fixed.  Especially
> in back-end code for targets other than x86_64.
Thanks for the tidbit about this only causing warnings, but not errors
at this point.   I suspect that account for the jump in warnings.
There's a bunch of them in tree-ssa.c, rs6000.c, and cp/error.c (ppc64
build I just happened to look at).

Given that they're just warnings rather than hard errors, let's go ahead
with the patch.  We can iterate on driving the warnings down on the
ports and in the generic bits.

Jeff

Reply via email to