On 6/18/19 9:19 PM, Martin Sebor wrote:
> On 6/14/19 2:59 PM, Jeff Law wrote:
[ big snip ]
>> A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
>> need to consider here -- what is and what is not valid gimple.  And its
>> more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
>> standard practice for if-conversion.
>>
>> Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
>> well, I just shoved a quick assert into optimize_stmt to abort if we
>> encounter a gimple assignment where the RHS is a COND_EXPR.  It blew up
>> instantly building libgcc :-)
>>
>> COmpile the attached code with -O2 -m32.
> 
> I wasn't saying it's not valid Gimple, just that I hadn't seen it
> come up here despite compiling Glibc and the kernel with the patched
> GCC.  The only codes I saw are these:
> 
>   addr_expr, array_ref, bit_and_expr, component_ref, max_expr,
>   mem_ref, nop_expr, parm_decl, pointer_plus_expr, ssa_name,
>   and var_decl
The only one here that's really surprising is the MAX_EXPR.  But it is
what it is.

> 
> What I was asking for is a test case that makes COND_EXPR come up
> in this context.  But I managed to find one by triggering the ICE
> with GDB.  The file reduced to the following test case:
Sorry.  email can be a tough medium to nail down specific details.

> 
>   extern struct S s;   // S has to be an incomplete type
> 
>   void* f (int n)
>   {
>     void *p;
>     int x = 0;
> 
>     for (int i = n; i >= 0; i--)
>       {
>         p = &s;
>         if (p == (void*)-1)
>            x = 1;
>         else if (p)
>            return p;
>       }
> 
>     return x ? (void*)-1 : 0;
>   }
> 
> and the dump:
> 
>   <bb 6> [local count: 59055800]:
>   # x_10 = PHI <1(5), 0(2)>
>   _5 = x_10 != 0 ? -1B : 0B;
> 
>   <bb 7> [local count: 114863532]:
>   # _3 = PHI <&s(4), _5(6), &s(3)>
>   return _3;
> 
> It seems a little odd that the COND_EXPR disappears when either
> of the constant addresses becomes the address of an object (or
> the result of alloca), and also when the number of iterations
> of the loop is constant.  That's probably why it so rarely comes
> up in this context.
Going into phiopt2 we have:

;;   basic block 6, loop depth 0
;;    pred:       5
  if (x_1 != 0)
    goto <bb 8>; [71.00%]
  else
    goto <bb 7>; [29.00%]
;;    succ:       8
;;                7

;;   basic block 7, loop depth 0
;;    pred:       6
;;    succ:       8

;;   basic block 8, loop depth 0
;;    pred:       3
;;                7
;;                6
  # _3 = PHI <&s(3), 0B(7), -1B(6)>
  return _3;

The subgraph starting at block #6 is a classic case for turning branchy
code into straightline code using a COND_EXPR on the RHS of an
assignment.  So you end up with something like this:

;;   basic block 6, loop depth 0
;;    pred:       5
  _5 = x_1 != 0 ? -1B : 0B;
;;    succ:       7

;;   basic block 7, loop depth 0
;;    pred:       3
;;                6
  # _3 = PHI <&s(3), _5(6)>
  return _3;


Now for this specific case within phiopt we are limited to cases there
the result is 0/1 or 0/-1.  That's why you get something different when
you exchange one of the constants for the address of an object, or
anything else for that matter.

This is all a bit academic -- the key point is that we can have a
COND_EXPR on the RHS of an assignment.  That's allowed by gimple.

Sadly this is also likely one of the places where target characteristics
come into play -- targets define a BRANCH_COST which can significantly
change the decisions for the initial generation of conditionals.  It's
one of the things that makes writing  tests for jump threading, if
conversion and other optimizations so damn painful -- on one target
we'll have a series of conditional jumps, on anothers we may have a
series of logicals, potentially with COND_EXPRs.


> 
> That said, even though I've seen a few references to COND_EXPR
> in the midle-end I have been assuming that they, in general, do
> get transformed into PHIs.  But checking the internals manual
> I found in Section 12.6.3 this:
> 
>   A C ?: expression is converted into an if statement with each
>   branch assigning to the same temporary. ... The GIMPLE level
>   if-conversion pass re-introduces ?: expression, if appropriate.
>   It is used to vectorize loops with conditions using vector
>   conditional operations.
> 
> This GDB test case is likely the result of this reintroduction.
Nope.  It happens much earlier in the pipeline :-)


>>
>> And in a more general sense, this kind of permissiveness is not future
>> proof.  What happens if someone needs to add another EXPR node that is
>> valid on the RHS where such recursion is undesirable?  How are they
>> supposed to know that we've got this permissive recursive call and that
>> it's going to do the wrong thing?  And if it's an EXPR node with no
>> arguments, then we're going to do a read out of the bounds of the object
>> and all bets are off at that point (yes we have zero operand EXPR nodes,
>> but thankfully I don't think they should up in the contexts you care about).
>>
> 
> Sure.  When things are hardcoded this way the same argument can
> be made both ways.  If we just handle A and B and then someone
> adds an X that matches one of the two, it won't be handled. Either
> way, some work is necessary to deal with the new Z.  One way to
> avoid it would be by providing an API to query this property of
> a code (e.g., a function like int gimple_result_aliases_operand
> (gimple *stmt, int opno)).
Adding a new opcode could have caused the original code to generate
incorrect code.  In the new version an indirect opcode would just result
in something we refuse to analyze -- so we could miss a warning, but
more importantly we would _not_ transform the return statement so we'd
be safe WRT correct code generation.

WRT making an API to do these kinds of queries.  Sure.  It's a fairly
natural extension to stuff we've done in the past.  You could (for
example) wrap up all kinds existing properties of nodes into an API.
You'll see that's already done in a fairly ad-hoc fashion, but it's not
even close to being pervasive or standard practice.

I would certainly applaud bringing some sanity to this kind of issue and
iterating towards a better place.  But it's ugly, painful work with
little visible end user benefit.

> 
>> But those seem like distinct issues.  The one I pointed out looks like a
>> pretty simple case to handle.  It's just a logic error.
> 
> It wasn't a logic error.  I simply didn't think it was necessary
> to worry about the accuracy of an inherently inaccurate warning,
> and I didn't want make more intrusive changes than was called for
> by my simple fix/enhancement.  I was also keeping in mind your
> preference for smaller change sets.  But since you insist I went
> ahead and improved things.  The warning is all the better for it,
> and the changes I made exposed a number of other bugs in GCC.
> At the same time, it seems that the bar for a simple bug fixes
> and small enhancements should not be in excess of what's possible
> by the original design.
It looked like a logic error to me that could be easily fixed.  I didn't
want to make it out to a huge deal.  ANd yes, there's a natural tension
between addressing small stuff as you see them in a larger kit and
breaking things down and iterating.  There aren't hard and fast rules here.



> 
> 
> gcc-71924.diff
> 
> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result
> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address 
> of a local array plus offset
> 
> gcc/ChangeLog:
> 
>       PR middle-end/71924
>       PR middle-end/90549
>       * gimple-ssa-isolate-paths.c (isolate_path): Add attribute.
>       (args_loc_t): New type.
>       (args_loc_t, locmap_t): same.
>       (diag_returned_locals): New function.
>       (is_addr_local): Same.
>       (handle_return_addr_local_phi_arg, warn_return_addr_local): Same.
>       (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg.
>       (find_explicit_erroneous_behavior): Call warn_return_addr_local.
> 
> libgcc/ChangeLog:
>       * generic-morestack.c: Disable -fisolate-erroneous-paths-dereference
>       to get around PR libgcc/90918.
> 
> gcc/testsuite/ChangeLog:
> 
>       PR middle-end/71924
>       PR middle-end/90549
>       * gcc.dg/Wreturn-local-addr-2.c: New test.
>       * gcc.dg/Wreturn-local-addr-4.c: New test.
>       * gcc.dg/Wreturn-local-addr-5.c: New test.
>       * gcc.dg/Wreturn-local-addr-6.c: New test.
>       * gcc.dg/Wreturn-local-addr-7.c: New test.
>       * gcc.dg/Wreturn-local-addr-8.c: New test.
>       * gcc.dg/Walloca-4.c: Prune expected warnings.
>       * gcc.dg/pr41551.c: Same.
>       * gcc.dg/pr59523.c: Same.
>       * gcc.dg/tree-ssa/pr88775-2.c: Same.
>       * gcc.dg/winline-7.c: Same.
> 
> diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
> index 33fe352bb23..13b1f5f2349 100644
> --- a/gcc/gimple-ssa-isolate-paths.c
> +++ b/gcc/gimple-ssa-isolate-paths.c
> @@ -130,7 +130,7 @@ insert_trap (gimple_stmt_iterator *si_p, tree op)
>  
>     Return BB'.  */
>  
> -basic_block
> +ATTRIBUTE_RETURNS_NONNULL basic_block
>  isolate_path (basic_block bb, basic_block duplicate,
>             edge e, gimple *stmt, tree op, bool ret_zero)
>  {
> @@ -341,6 +341,283 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt)
>    return false;
>  }
>  
> +/* Describes the property of a return statement that may return
> +   the address of one or more local variables.  */
> +struct args_loc_t
> +{
> +  args_loc_t (): nargs (), locvec (), ptr (&ptr)
> +  {
> +    locvec.create (4);
> +  }
> +
> +  args_loc_t (const args_loc_t &rhs)
> +    : nargs (rhs.nargs), locvec (rhs.locvec.copy ()), ptr (&ptr) { }
> +
> +  args_loc_t& operator= (const args_loc_t &rhs)
> +  {
> +    nargs = rhs.nargs;
> +    locvec.release ();
> +    locvec = rhs.locvec.copy ();
> +    return *this;
> +  }
> +
> +  ~args_loc_t ()
> +  {
> +    locvec.release ();
> +    gcc_assert (ptr == &ptr);
> +  }
> +
> +  /* For a PHI in a return statement its number of arguments.  When less
> +     than LOCVEC.LENGTH () implies that an address of local may but need
> +     not be returned by the statement.  Otherwise it implies it definitely
> +     is returned.  Used to issue "may be" vs "is" diagnostics.  */
> +  unsigned nargs;
> +  /* The locations of local variables/alloca calls returned by the return
> +     statement.  Avoid using auto_vec here since it's not safe to copy due
> +     to pr90904.  */
> +  vec <location_t> locvec;
> +  void *ptr;
> +};
Make this a class, please.  We use "struct" for POD.

Is there a strong need for an overloaded operator?  Our guidelines
generally discourage operator overloads.  This one seems on the border
to me.  Others may have different ideas where the line is.  If we really
don't need the overloaded operator, then we can avoid the controversy
completely.


> +
> +/* Return true if EXPR is a expression of pointer type that refers
> +   to the address of a variable with automatic storage duration.
> +   If so, add an entry to *PLOCMAP and insert INTO PLOCMAP->LOCVEC
> +   the locations of the corresponding local variables whose address
> +   is returned by the statement.  VISITED is a bitmap of PHI nodes
> +   already visited by recursive calls.  */
> +
> +static bool
> +is_addr_local (gimple *return_stmt, tree exp, locmap_t *plocmap,
> +            hash_set<gphi *> *visited)
> +{
> +  if (TREE_CODE (exp) == ADDR_EXPR)
> +    {
> +      tree baseaddr = get_base_address (TREE_OPERAND (exp, 0));
> +      if (TREE_CODE (baseaddr) == MEM_REF)
> +     return is_addr_local (return_stmt, TREE_OPERAND (baseaddr, 0),
> +                           plocmap, visited);
> +
> +      if ((!VAR_P (baseaddr)
> +        || is_global_var (baseaddr))
> +       && TREE_CODE (baseaddr) != PARM_DECL)
> +     return false;
> +
> +      args_loc_t &argsloc = plocmap->get_or_insert (return_stmt);
> +      argsloc.locvec.safe_push (DECL_SOURCE_LOCATION (baseaddr));
> +      return true;
> +    }
> +
> +  if (!POINTER_TYPE_P (TREE_TYPE (exp)))
> +    return false;
> +
> +  if (TREE_CODE (exp) == SSA_NAME)
> +    {
> +      gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
> +      enum gimple_code code = gimple_code (def_stmt);
> +
> +      if (is_gimple_assign (def_stmt))
> +     {
> +       tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
> +       if (POINTER_TYPE_P (type))
> +         {
> +           tree_code code = gimple_assign_rhs_code (def_stmt);
> +           tree ptr1 = NULL_TREE, ptr2 = NULL_TREE;
> +           if (code == COND_EXPR)
> +             {
> +               ptr1 = gimple_assign_rhs2 (def_stmt);
> +               ptr2 = gimple_assign_rhs3 (def_stmt);
> +             }
> +           else if (code == MAX_EXPR || code == MIN_EXPR)
> +             {
> +               ptr1 = gimple_assign_rhs1 (def_stmt);
> +               ptr2 = gimple_assign_rhs2 (def_stmt);
> +             }
> +           else if (code == ADDR_EXPR
> +                    || code == NOP_EXPR
> +                    || code == POINTER_PLUS_EXPR)
> +             ptr1 = gimple_assign_rhs1 (def_stmt);
> +
> +           /* Avoid short-circuiting the logical OR result in case
> +              both oerands refer to local variables, in which case
> +              both should be identified in the warning.  */
> +           bool res1 = false, res2 = false;
> +           if (ptr1)
> +             res1 = is_addr_local (return_stmt, ptr1, plocmap, visited);
> +           if (ptr2)
> +             res2 = is_addr_local (return_stmt, ptr2, plocmap, visited);
> +           return res1 || res2;
> +         }
> +       return false;
> +     }
s/oerands/operands/  a few lines up.

So how do you deal with the case where one operand of a
{MIN,MAX,COND}_EXPR is the address of a local, but the other is not?  It
seems like we'll end up returning true from this function in that case
and potentially trigger changing the return value to NULL.  Or am I
missing something?





Jeff

Reply via email to