On Tue, Aug 28, 2018 at 11:55 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Tue, Aug 28, 2018 at 6:27 AM Jeff Law <l...@redhat.com> wrote:
> >
> > On 08/27/2018 10:27 AM, Martin Sebor wrote:
> > > On 08/27/2018 02:29 AM, Richard Biener wrote:
> > >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <l...@redhat.com> wrote:
> > >>>
> > >>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> > >>>> The warning suppression for -Wstringop-truncation looks for
> > >>>> the next statement after a truncating strncpy to see if it
> > >>>> adds a terminating nul.  This only works when the next
> > >>>> statement can be reached using the Gimple statement iterator
> > >>>> which isn't until after gimplification.  As a result, strncpy
> > >>>> calls that truncate their constant argument that are being
> > >>>> folded to memcpy this early get diagnosed even if they are
> > >>>> followed by the nul assignment:
> > >>>>
> > >>>>   const char s[] = "12345";
> > >>>>   char d[3];
> > >>>>
> > >>>>   void f (void)
> > >>>>   {
> > >>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> > >>>>     d[sizeof d - 1] = 0;
> > >>>>   }
> > >>>>
> > >>>> To avoid the warning I propose to defer folding strncpy to
> > >>>> memcpy until the pointer to the basic block the strnpy call
> > >>>> is in can be used to try to reach the next statement (this
> > >>>> happens as early as ccp1).  I'm aware of the preference to
> > >>>> fold things early but in the case of strncpy (a relatively
> > >>>> rarely used function that is often misused), getting
> > >>>> the warning right while folding a bit later but still fairly
> > >>>> early on seems like a reasonable compromise.  I fear that
> > >>>> otherwise, the false positives will drive users to adopt
> > >>>> other unsafe solutions (like memcpy) where these kinds of
> > >>>> bugs cannot be as readily detected.
> > >>>>
> > >>>> Tested on x86_64-linux.
> > >>>>
> > >>>> Martin
> > >>>>
> > >>>> PS There still are outstanding cases where the warning can
> > >>>> be avoided.  I xfailed them in the test for now but will
> > >>>> still try to get them to work for GCC 9.
> > >>>>
> > >>>> gcc-87028.diff
> > >>>>
> > >>>>
> > >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> > >>>> strncpy with global variable source string
> > >>>> gcc/ChangeLog:
> > >>>>
> > >>>>       PR tree-optimization/87028
> > >>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> > >>>>       statement doesn't belong to a basic block.
> > >>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> > >>>>       the left hand side of assignment.
> > >>>>
> > >>>> gcc/testsuite/ChangeLog:
> > >>>>
> > >>>>       PR tree-optimization/87028
> > >>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> > >>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> > >>>>
> > >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> > >>>> index 07341eb..284c2fb 100644
> > >>>> --- a/gcc/gimple-fold.c
> > >>>> +++ b/gcc/gimple-fold.c
> > >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> > >>>> (gimple_stmt_iterator *gsi,
> > >>>>    if (tree_int_cst_lt (ssize, len))
> > >>>>      return false;
> > >>>>
> > >>>> +  /* Defer warning (and folding) until the next statement in the basic
> > >>>> +     block is reachable.  */
> > >>>> +  if (!gimple_bb (stmt))
> > >>>> +    return false;
> > >>> I think you want cfun->cfg as the test here.  They should be equivalent
> > >>> in practice.
> > >>
> > >> Please do not add 'cfun' references.  Note that the next stmt is also
> > >> accessible
> > >> when there is no CFG.  I guess the issue is that we fold this during
> > >> gimplification
> > >> where the next stmt is not yet "there" (but still in GENERIC)?
> > >>
> > >> We generally do not want to have unfolded stmts in the IL when we can
> > >> avoid that
> > >> which is why we fold most stmts during gimplification.  We also do
> > >> that because
> > >> we now do less folding on GENERIC.
> > >>
> > >> There may be the possibility to refactor gimplification time folding
> > >> to what we
> > >> do during inlining - queue stmts we want to fold and perform all
> > >> folding delayed.
> > >> This of course means bigger compile-time due to cache effects.
> > >>
> > >>>
> > >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> > >>>> index d0792aa..f1988f6 100644
> > >>>> --- a/gcc/tree-ssa-strlen.c
> > >>>> +++ b/gcc/tree-ssa-strlen.c
> > >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
> > >>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
> > >>>>         && known_eq (dstoff, lhsoff)
> > >>>>         && operand_equal_p (dstbase, lhsbase, 0))
> > >>>>       return false;
> > >>>> +
> > >>>> +      if (code == MEM_REF
> > >>>> +       && TREE_CODE (lhsbase) == SSA_NAME
> > >>>> +       && known_eq (dstoff, lhsoff))
> > >>>> +     {
> > >>>> +       /* Extract the referenced variable from something like
> > >>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
> > >>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> > >>>> +       if (gimple_nop_p (def))
> > >>>> +         {
> > >>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
> > >>>> +           if (lhsbase
> > >>>> +               && dstbase
> > >>>> +               && operand_equal_p (dstbase, lhsbase, 0))
> > >>>> +             return false;
> > >>>> +         }
> > >>>> +     }
> > >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> > >>> the wrong tree.  It'd be easier to suggest something here if I could see
> > >>> the gimple (with virtual operands).  BUt at some level what you really
> > >>> want to do is make sure the base of the MEM_REF is the same as what got
> > >>> passed as the destination of the strncpy.  You'd want to be testing
> > >>> SSA_NAMEs in that case.
> > >>
> > >> Yes.  Why not simply compare the SSA names?  Why would it be
> > >> not OK to do that when !lhsbase?
> > >
> > > The added code handles this case:
> > >
> > >   void f (char *d)
> > >   {
> > >     __builtin_strncpy (d, "12345", 4);
> > >     d[3] = 0;
> > >   }
> > >
> > > where during forwprop we see:
> > >
> > >   __builtin_strncpy (d_3(D), "12345", 4);
> > >   MEM[(char *)d_3(D) + 3B] = 0;
> > >
> > > The next statement after the strncpy is the assignment whose
> > > lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> > > is no other information in the GIMPLE_NOP that I can see to
> > > tell that the operand is d_3(D) or that it's the same as
> > > the strncpy argument (i.e., the PARAM_DECl d).  Having to
> > > do open-code this all the time seems so cumbersome -- is
> > > there some API that would do this for me?  (I thought
> > > get_addr_base_and_unit_offset was that API but clearly in
> > > this case it doesn't do what I expect -- it just returns
> > > the argument.)
> >
> > I think you need to look harder at that MEM_REF.  It references d_3.
> > That's what you need to be checking.  The base (d_3) is the first
> > operand of the MEM_REF, the offset is the second operand of the MEM_REF.
> >
> > (gdb) p debug_gimple_stmt ($2)
> > # .MEM_5 = VDEF <.MEM_4>
> > MEM[(char *)d_3(D) + 3B] = 0;
> >
> >
> > (gdb) p gimple_assign_lhs ($2)
> > $5 = (tree_node *) 0x7ffff01a6208
> >
> > (gdb) p debug_tree ($5)
> >  <mem_ref 0x7ffff01a6208
> >     type <integer_type 0x7ffff00723f0 char public string-flag QI
> >         size <integer_cst 0x7ffff0059d80 constant 8>
> >         unit-size <integer_cst 0x7ffff0059d98 constant 1>
> >         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> > 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> > <integer_cst 0x7ffff0059df8 127>
> >         pointer_to_this <pointer_type 0x7ffff007de70>>
> >
> >     arg:0 <ssa_name 0x7ffff0063dc8
> >         type <pointer_type 0x7ffff007de70 type <integer_type
> > 0x7ffff00723f0 char>
> >             public unsigned DI
> >             size <integer_cst 0x7ffff0059c90 constant 64>
> >             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
> >             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff007de70 reference_to_this <reference_type
> > 0x7ffff017d738>>
> >         visited var <parm_decl 0x7ffff01a5000 d>
> >         def_stmt GIMPLE_NOP
> >         version:3>
> >     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> > constant 3>
> >     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
> >
> >
> > Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
> >
> > (gdb) p debug_tree (lhsbase)
> > <ssa_name 0x7ffff0063dc8
> >     type <pointer_type 0x7ffff007de70
> >         type <integer_type 0x7ffff00723f0 char public string-flag QI
> >             size <integer_cst 0x7ffff0059d80 constant 8>
> >             unit-size <integer_cst 0x7ffff0059d98 constant 1>
> >             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> > 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
> >             pointer_to_this <pointer_type 0x7ffff007de70>>
> >         public unsigned DI
> >         size <integer_cst 0x7ffff0059c90 constant 64>
> >         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
> >         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff007de70 reference_to_this <reference_type
> > 0x7ffff017d738>>
> >     visited var <parm_decl 0x7ffff01a5000 d>
> >     def_stmt GIMPLE_NOP
> >     version:3>
> >
> >
> > Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> > "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> > debug get_addr_base_and_unit_offset to understand what's going on.
> > Essentially you're getting different results of
> > get_addr_base_and_unit_offset in a case where they arguably should be
> > the same.
>
> Probably get_attr_nonstring_decl has the same "mistake" and returns
> the PARM_DECL instead of the SSA name pointer.  So we're comparing
> apples and oranges here.
>
> Yeah:
>
> /* If EXPR refers to a character array or pointer declared attribute
>    nonstring return a decl for that array or pointer and set *REF to
>    the referenced enclosing object or pointer.  Otherwise returns
>    null.  */
>
> tree
> get_attr_nonstring_decl (tree expr, tree *ref)
> {
>   tree decl = expr;
>   if (TREE_CODE (decl) == SSA_NAME)
>     {
>       gimple *def = SSA_NAME_DEF_STMT (decl);
>
>       if (is_gimple_assign (def))
>         {
>           tree_code code = gimple_assign_rhs_code (def);
>           if (code == ADDR_EXPR
>               || code == COMPONENT_REF
>               || code == VAR_DECL)
>             decl = gimple_assign_rhs1 (def);
>         }
>       else if (tree var = SSA_NAME_VAR (decl))
>         decl = var;
>     }
>
>   if (TREE_CODE (decl) == ADDR_EXPR)
>     decl = TREE_OPERAND (decl, 0);
>
>   if (ref)
>     *ref = decl;
>
> I see a lot of "magic" here again in the attempt to "propagate"
> a nonstring attribute.  Note
>
> foo (char *p __attribute__(("nonstring")))
> {
>   p = "bar";
>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> }
>
> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I think in your code comparing bases you want to look at the _original_
> argument to the string function rather than what get_attr_nonstring_decl
> returned as ref.

Oh, and this 'nonstring' feels like sth that could be propagated by points-to
analysis.

> Richard.
>
> > Jeff
> >
> > Jeff

Reply via email to