On 5/22/19 10:42 AM, Martin Sebor wrote:
> Attached is a revised implementation of the -Wformat-diag checker
> incorporating the feedback I got on the first revision.
> 
> Martin
> 
> gcc-wformat-diag-checker.diff
> 
> gcc/c-family/ChangeLog:
> 
>       * c-format.c (function_format_info::format_type): Adjust type.
>       (function_format_info::is_raw): New member.
>       (decode_format_type): Adjust signature.  Handle "raw" diag attributes.
>       (decode_format_attr): Adjust call to decode_format_type.
>       Avoid a redundant call to convert_format_name_to_system_name.
>       Avoid abbreviating the word "arguments" in a diagnostic.
>       (format_warning_substr): New function.
>       (avoid_dollar_number): Quote dollar sign in a diagnostic.
>       (finish_dollar_format_checking): Same.
>       (check_format_info): Same.
>       (struct baltoks_t): New.
>       (c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
>       (maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
>       functions.
>       (check_format_info_main): Call check_plain.  Use baltoks_t.  Call
>       maybe_diag_unbalanced_tokens.
>       (handle_format_attribute): Spell out the word "arguments" in
>       a diagnostic.
> 
> gcc/testsuite/ChangeLog:
>       * gcc.dg/format/gcc_diag-11.c: New test.
High level comment.  This is painful to read.  But that's probably an
artifact of trying to cobble together C code to parse strings and codify
the conventions.    ie, it's likely inherent for the problem you're
trying to solve.


> 
> diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
> index a7f76c1c01d..713fce442c9 100644
> --- a/gcc/c-family/c-format.c
> +++ b/gcc/c-family/c-format.c
> @@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree atname, tree 
> args,
>      {
>        const char *p = IDENTIFIER_POINTER (format_type_id);
>  
> -      p = convert_format_name_to_system_name (p);
> +      info->format_type = decode_format_type (p, &info->is_raw);
>  
> -      info->format_type = decode_format_type (p);
> -      
>        if (!c_dialect_objc ()
>          && info->format_type == gcc_objc_string_format_type)
>       {
Did you mean to drop the call to convert_format_name_to_system_name?



> @@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info *fci,
>    return true;
>  }
>  
> +
> +/* Helper for initializing global token_t arrays below.  */
> +#define NAME(name) { name, sizeof name - 1, NULL }
> +
> +/* C/C++ operators that are expected to be quoted within the format
> +   string.  */
> +
> +static const token_t c_opers[] =
> +  {
> +   NAME ("!="), NAME ("%="),  NAME ("&&"),  NAME ("&="), NAME ("*="),
> +   NAME ("++"), NAME ("+="),  NAME ("--"),  NAME ("-="), NAME ("->"),
> +   NAME ("/="), NAME ("<<"),  NAME ("<<="), NAME ("<="), NAME ("=="),
> +   NAME (">="), NAME (">>="), NAME (">>"),  NAME ("?:"),  NAME ("^="),
> +   NAME ("|="), NAME ("||")
> +  };
This clearly isn't a full list of operators.  Is there a particular
reason why we aren't diagnosing others.  I guess you're catching the
single character operators via the ISPUNCT checks?  That seems a little
loose (@ isn't an operator for example).  It  may be OK in practice though.



> +

> +
> +  if (nchars)
> +    {
> +      /* This is the most common problem: go the extra mile to decribe
s/decribe/describe/



> +
> +static void
> +maybe_diag_unbalanced_tokens (location_t format_string_loc,
> +                           const char *orig_format_chars,
> +                           tree format_string_cst,
> +                           baltoks_t &baltoks)
Needs a function comment.


> @@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results *res,
>  
>    init_dollar_format_checking (info->first_arg_num, first_fillin_param);
>  
> +  /* In GCC diagnostic functions check plain directives (substrings within
> +     the format string that don't start with %) for quoting and punctuations
> +     problems.  */
> +  bool ck_plain = (!info->is_raw
> +                && (info->format_type == gcc_diag_format_type
> +                    || info->format_type == gcc_tdiag_format_type
> +                    || info->format_type == gcc_cdiag_format_type
> +                    || info->format_type == gcc_cxxdiag_format_type));
> +
>    while (*format_chars != 0)
>      {
> -      if (*format_chars++ != '%')
> +      if (ck_plain)
> +     format_chars = check_plain (format_string_loc,
> +                                 format_string_cst,
> +                                 orig_format_chars, format_chars,
> +                                 baltoks);
> +
> +      if (*format_chars == 0 || *format_chars++ != '%')
>       continue;
> +
>        if (*format_chars == 0)
Isn't the second test of *format_chars == 0 dead now?

I'm going to throw this into my tester and see what, if anything, pops
out while you fixup the nits above.  Assuming there isn't anything
major, my inclination is to go forward.  We may over time improve the
rules around diagnostics which will require iteration here, but this is
clearly a step forward.

jeff

Reply via email to