On Tue, Aug 28, 2018 at 11:55 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Tue, Aug 28, 2018 at 6:27 AM Jeff Law <l...@redhat.com> wrote: > > > > On 08/27/2018 10:27 AM, Martin Sebor wrote: > > > On 08/27/2018 02:29 AM, Richard Biener wrote: > > >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <l...@redhat.com> wrote: > > >>> > > >>> On 08/24/2018 09:58 AM, Martin Sebor wrote: > > >>>> The warning suppression for -Wstringop-truncation looks for > > >>>> the next statement after a truncating strncpy to see if it > > >>>> adds a terminating nul. This only works when the next > > >>>> statement can be reached using the Gimple statement iterator > > >>>> which isn't until after gimplification. As a result, strncpy > > >>>> calls that truncate their constant argument that are being > > >>>> folded to memcpy this early get diagnosed even if they are > > >>>> followed by the nul assignment: > > >>>> > > >>>> const char s[] = "12345"; > > >>>> char d[3]; > > >>>> > > >>>> void f (void) > > >>>> { > > >>>> strncpy (d, s, sizeof d - 1); // -Wstringop-truncation > > >>>> d[sizeof d - 1] = 0; > > >>>> } > > >>>> > > >>>> To avoid the warning I propose to defer folding strncpy to > > >>>> memcpy until the pointer to the basic block the strnpy call > > >>>> is in can be used to try to reach the next statement (this > > >>>> happens as early as ccp1). I'm aware of the preference to > > >>>> fold things early but in the case of strncpy (a relatively > > >>>> rarely used function that is often misused), getting > > >>>> the warning right while folding a bit later but still fairly > > >>>> early on seems like a reasonable compromise. I fear that > > >>>> otherwise, the false positives will drive users to adopt > > >>>> other unsafe solutions (like memcpy) where these kinds of > > >>>> bugs cannot be as readily detected. > > >>>> > > >>>> Tested on x86_64-linux. > > >>>> > > >>>> Martin > > >>>> > > >>>> PS There still are outstanding cases where the warning can > > >>>> be avoided. I xfailed them in the test for now but will > > >>>> still try to get them to work for GCC 9. > > >>>> > > >>>> gcc-87028.diff > > >>>> > > >>>> > > >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation > > >>>> strncpy with global variable source string > > >>>> gcc/ChangeLog: > > >>>> > > >>>> PR tree-optimization/87028 > > >>>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when > > >>>> statement doesn't belong to a basic block. > > >>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on > > >>>> the left hand side of assignment. > > >>>> > > >>>> gcc/testsuite/ChangeLog: > > >>>> > > >>>> PR tree-optimization/87028 > > >>>> * c-c++-common/Wstringop-truncation.c: Remove xfails. > > >>>> * gcc.dg/Wstringop-truncation-5.c: New test. > > >>>> > > >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > > >>>> index 07341eb..284c2fb 100644 > > >>>> --- a/gcc/gimple-fold.c > > >>>> +++ b/gcc/gimple-fold.c > > >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy > > >>>> (gimple_stmt_iterator *gsi, > > >>>> if (tree_int_cst_lt (ssize, len)) > > >>>> return false; > > >>>> > > >>>> + /* Defer warning (and folding) until the next statement in the basic > > >>>> + block is reachable. */ > > >>>> + if (!gimple_bb (stmt)) > > >>>> + return false; > > >>> I think you want cfun->cfg as the test here. They should be equivalent > > >>> in practice. > > >> > > >> Please do not add 'cfun' references. Note that the next stmt is also > > >> accessible > > >> when there is no CFG. I guess the issue is that we fold this during > > >> gimplification > > >> where the next stmt is not yet "there" (but still in GENERIC)? > > >> > > >> We generally do not want to have unfolded stmts in the IL when we can > > >> avoid that > > >> which is why we fold most stmts during gimplification. We also do > > >> that because > > >> we now do less folding on GENERIC. > > >> > > >> There may be the possibility to refactor gimplification time folding > > >> to what we > > >> do during inlining - queue stmts we want to fold and perform all > > >> folding delayed. > > >> This of course means bigger compile-time due to cache effects. > > >> > > >>> > > >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c > > >>>> index d0792aa..f1988f6 100644 > > >>>> --- a/gcc/tree-ssa-strlen.c > > >>>> +++ b/gcc/tree-ssa-strlen.c > > >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc > > >>>> (gimple_stmt_iterator gsi, tree src, tree cnt) > > >>>> && known_eq (dstoff, lhsoff) > > >>>> && operand_equal_p (dstbase, lhsbase, 0)) > > >>>> return false; > > >>>> + > > >>>> + if (code == MEM_REF > > >>>> + && TREE_CODE (lhsbase) == SSA_NAME > > >>>> + && known_eq (dstoff, lhsoff)) > > >>>> + { > > >>>> + /* Extract the referenced variable from something like > > >>>> + MEM[(char *)d_3(D) + 3B] = 0; */ > > >>>> + gimple *def = SSA_NAME_DEF_STMT (lhsbase); > > >>>> + if (gimple_nop_p (def)) > > >>>> + { > > >>>> + lhsbase = SSA_NAME_VAR (lhsbase); > > >>>> + if (lhsbase > > >>>> + && dstbase > > >>>> + && operand_equal_p (dstbase, lhsbase, 0)) > > >>>> + return false; > > >>>> + } > > >>>> + } > > >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up > > >>> the wrong tree. It'd be easier to suggest something here if I could see > > >>> the gimple (with virtual operands). BUt at some level what you really > > >>> want to do is make sure the base of the MEM_REF is the same as what got > > >>> passed as the destination of the strncpy. You'd want to be testing > > >>> SSA_NAMEs in that case. > > >> > > >> Yes. Why not simply compare the SSA names? Why would it be > > >> not OK to do that when !lhsbase? > > > > > > The added code handles this case: > > > > > > void f (char *d) > > > { > > > __builtin_strncpy (d, "12345", 4); > > > d[3] = 0; > > > } > > > > > > where during forwprop we see: > > > > > > __builtin_strncpy (d_3(D), "12345", 4); > > > MEM[(char *)d_3(D) + 3B] = 0; > > > > > > The next statement after the strncpy is the assignment whose > > > lhs is the MEM_REF with a GIMPLE_NOP as an operand. There > > > is no other information in the GIMPLE_NOP that I can see to > > > tell that the operand is d_3(D) or that it's the same as > > > the strncpy argument (i.e., the PARAM_DECl d). Having to > > > do open-code this all the time seems so cumbersome -- is > > > there some API that would do this for me? (I thought > > > get_addr_base_and_unit_offset was that API but clearly in > > > this case it doesn't do what I expect -- it just returns > > > the argument.) > > > > I think you need to look harder at that MEM_REF. It references d_3. > > That's what you need to be checking. The base (d_3) is the first > > operand of the MEM_REF, the offset is the second operand of the MEM_REF. > > > > (gdb) p debug_gimple_stmt ($2) > > # .MEM_5 = VDEF <.MEM_4> > > MEM[(char *)d_3(D) + 3B] = 0; > > > > > > (gdb) p gimple_assign_lhs ($2) > > $5 = (tree_node *) 0x7ffff01a6208 > > > > (gdb) p debug_tree ($5) > > <mem_ref 0x7ffff01a6208 > > type <integer_type 0x7ffff00723f0 char public string-flag QI > > size <integer_cst 0x7ffff0059d80 constant 8> > > unit-size <integer_cst 0x7ffff0059d98 constant 1> > > align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type > > 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max > > <integer_cst 0x7ffff0059df8 127> > > pointer_to_this <pointer_type 0x7ffff007de70>> > > > > arg:0 <ssa_name 0x7ffff0063dc8 > > type <pointer_type 0x7ffff007de70 type <integer_type > > 0x7ffff00723f0 char> > > public unsigned DI > > size <integer_cst 0x7ffff0059c90 constant 64> > > unit-size <integer_cst 0x7ffff0059ca8 constant 8> > > align:64 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x7ffff007de70 reference_to_this <reference_type > > 0x7ffff017d738>> > > visited var <parm_decl 0x7ffff01a5000 d> > > def_stmt GIMPLE_NOP > > version:3> > > arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70> > > constant 3> > > j.c:4:6 start: j.c:4:5 finish: j.c:4:8> > > > > > > Note arg:0 is the SSA_NAME d_3. And not surprising that's lhsbase: > > > > (gdb) p debug_tree (lhsbase) > > <ssa_name 0x7ffff0063dc8 > > type <pointer_type 0x7ffff007de70 > > type <integer_type 0x7ffff00723f0 char public string-flag QI > > size <integer_cst 0x7ffff0059d80 constant 8> > > unit-size <integer_cst 0x7ffff0059d98 constant 1> > > align:8 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x7ffff00723f0 precision:8 min <integer_cst > > 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127> > > pointer_to_this <pointer_type 0x7ffff007de70>> > > public unsigned DI > > size <integer_cst 0x7ffff0059c90 constant 64> > > unit-size <integer_cst 0x7ffff0059ca8 constant 8> > > align:64 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x7ffff007de70 reference_to_this <reference_type > > 0x7ffff017d738>> > > visited var <parm_decl 0x7ffff01a5000 d> > > def_stmt GIMPLE_NOP > > version:3> > > > > > > Sadly, dstbase is the PARM_DECL for d. That's where things are going > > "wrong". Not sure why you're getting the PARM_DECL in that case. I'd > > debug get_addr_base_and_unit_offset to understand what's going on. > > Essentially you're getting different results of > > get_addr_base_and_unit_offset in a case where they arguably should be > > the same. > > Probably get_attr_nonstring_decl has the same "mistake" and returns > the PARM_DECL instead of the SSA name pointer. So we're comparing > apples and oranges here. > > Yeah: > > /* If EXPR refers to a character array or pointer declared attribute > nonstring return a decl for that array or pointer and set *REF to > the referenced enclosing object or pointer. Otherwise returns > null. */ > > tree > get_attr_nonstring_decl (tree expr, tree *ref) > { > tree decl = expr; > if (TREE_CODE (decl) == SSA_NAME) > { > gimple *def = SSA_NAME_DEF_STMT (decl); > > if (is_gimple_assign (def)) > { > tree_code code = gimple_assign_rhs_code (def); > if (code == ADDR_EXPR > || code == COMPONENT_REF > || code == VAR_DECL) > decl = gimple_assign_rhs1 (def); > } > else if (tree var = SSA_NAME_VAR (decl)) > decl = var; > } > > if (TREE_CODE (decl) == ADDR_EXPR) > decl = TREE_OPERAND (decl, 0); > > if (ref) > *ref = decl; > > I see a lot of "magic" here again in the attempt to "propagate" > a nonstring attribute. Note > > foo (char *p __attribute__(("nonstring"))) > { > p = "bar"; > strlen (p); // or whatever is necessary to call get_attr_nonstring_decl > } > > is perfectly valid and p as passed to strlen is _not_ nonstring(?). > > I think in your code comparing bases you want to look at the _original_ > argument to the string function rather than what get_attr_nonstring_decl > returned as ref.
Oh, and this 'nonstring' feels like sth that could be propagated by points-to analysis. > Richard. > > > Jeff > > > > Jeff