Ping: did my reply and updated patch resolve your concerns?
  https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01106.html

On 6/18/19 9:19 PM, Martin Sebor wrote:
On 6/14/19 2:59 PM, Jeff Law wrote:
On 6/4/19 1:40 PM, Martin Sebor wrote:
On 6/3/19 5:24 PM, Martin Sebor wrote:
On 5/31/19 2:46 PM, Jeff Law wrote:
On 5/22/19 3:34 PM, Martin Sebor wrote:
-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

    void* f (int i)
    {
      struct S { int a[2]; } s[2];
      return &s->a[i];
    }

[2] The following is not diagnosed even with the patch:

    void sink (void*);

    void* f (int i)
    {
      char a[6];
      char *p = __builtin_strcpy (a, "123");
      sink (p);
      return p;
    }

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
address of a local array plus offset

gcc/ChangeLog:

     PR c/71924
     * gimple-ssa-isolate-paths.c (is_addr_local): New function.
     (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
     (find_implicit_erroneous_behavior): Call
warn_return_addr_local_phi_arg.
     (find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

     PR c/71924
     * gcc.dg/Wreturn-local-addr-2.c: New test.
     * gcc.dg/Walloca-4.c: Prune expected warnings.
     * gcc.dg/pr41551.c: Same.
     * gcc.dg/pr59523.c: Same.
     * gcc.dg/tree-ssa/pr88775-2.c: Same.
     * gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c
b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
*stmt)
     return false;
   }
+/* Return true if EXPR is a expression of pointer type that refers
+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+           hash_set<gphi *> *visited = NULL)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+      gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+      enum gimple_code code = gimple_code (def_stmt);
+
+      if (is_gimple_assign (def_stmt))
+    {
+      tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+      if (POINTER_TYPE_P (type))
+        {
+          tree ptr = gimple_assign_rhs1 (def_stmt);
+          return is_addr_local (ptr, ploc, pmaybe, visited);
+        }
+      return false;
+    }
So this is going to recurse on the rhs1 of something like
POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it non-selective
about the codes where we recurse?

Consider

    ptr = (cond) ? res1 : res2

I think we'll end up recursing on the condition rather than looking at
res1 and res2.


I suspect there are a very limited number of expression codes that
appear on the RHS where we'd want to recurse on one or both operands.

POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse on both and logically and the result), BIT_AND (maybe we masked off some
bits in an address).  That's probably about it :-)

Are there any other codes you've seen or think would be useful in
practice to recurse through?  I'd rather list them explicitly rather
than just recurse down through every rhs1 we encounter.

I don't have a list of codes to test for.  I initially contemplated
enumerating them but in the end decided the pointer type check would
be sufficient.  I wouldn't expect a COND_EXPR here.  Don't they get
transformed into PHIs?  In all my tests they do and and running
the whole test suite with an assert that it doesn't come up doesn't
expose any either.  (I left the assert for COND_EXPR there.)  If
a COND_EXPR really can come up in a GIMPLE assignment here can you
please show me how so I can add a test for it?
A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
need to consider here -- what is and what is not valid gimple.  And its
more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
standard practice for if-conversion.

Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
well, I just shoved a quick assert into optimize_stmt to abort if we
encounter a gimple assignment where the RHS is a COND_EXPR.  It blew up
instantly building libgcc :-)

COmpile the attached code with -O2 -m32.

I wasn't saying it's not valid Gimple, just that I hadn't seen it
come up here despite compiling Glibc and the kernel with the patched
GCC.  The only codes I saw are these:

   addr_expr, array_ref, bit_and_expr, component_ref, max_expr,
   mem_ref, nop_expr, parm_decl, pointer_plus_expr, ssa_name,
   and var_decl

What I was asking for is a test case that makes COND_EXPR come up
in this context.  But I managed to find one by triggering the ICE
with GDB.  The file reduced to the following test case:

   extern struct S s;   // S has to be an incomplete type

   void* f (int n)
   {
     void *p;
     int x = 0;

     for (int i = n; i >= 0; i--)
       {
         p = &s;
         if (p == (void*)-1)
            x = 1;
         else if (p)
            return p;
       }

     return x ? (void*)-1 : 0;
   }

and the dump:

   <bb 6> [local count: 59055800]:
   # x_10 = PHI <1(5), 0(2)>
   _5 = x_10 != 0 ? -1B : 0B;

   <bb 7> [local count: 114863532]:
   # _3 = PHI <&s(4), _5(6), &s(3)>
   return _3;

It seems a little odd that the COND_EXPR disappears when either
of the constant addresses becomes the address of an object (or
the result of alloca), and also when the number of iterations
of the loop is constant.  That's probably why it so rarely comes
up in this context.

That said, even though I've seen a few references to COND_EXPR
in the midle-end I have been assuming that they, in general, do
get transformed into PHIs.  But checking the internals manual
I found in Section 12.6.3 this:

   A C ?: expression is converted into an if statement with each
   branch assigning to the same temporary. ... The GIMPLE level
   if-conversion pass re-introduces ?: expression, if appropriate.
   It is used to vectorize loops with conditions using vector
   conditional operations.

This GDB test case is likely the result of this reintroduction.

You'll see them for stuff like this:


;;   basic block 24, loop depth 0
;;    pred:       23
   _244 = _1 == 0 ? 1 : 2;

In a few spots.


In addition to COND_EXPR, I'd be worried about VEC_COND_EXPR,
VEC_PERM_EXPR and a host of others.

And in a more general sense, this kind of permissiveness is not future
proof.  What happens if someone needs to add another EXPR node that is
valid on the RHS where such recursion is undesirable?  How are they
supposed to know that we've got this permissive recursive call and that
it's going to do the wrong thing?  And if it's an EXPR node with no
arguments, then we're going to do a read out of the bounds of the object
and all bets are off at that point (yes we have zero operand EXPR nodes,
but thankfully I don't think they should up in the contexts you care about).

Sure.  When things are hardcoded this way the same argument can
be made both ways.  If we just handle A and B and then someone
adds an X that matches one of the two, it won't be handled. Either
way, some work is necessary to deal with the new Z.  One way to
avoid it would be by providing an API to query this property of
a code (e.g., a function like int gimple_result_aliases_operand
(gimple *stmt, int opno)).

I'd be much more comfortable with a tight predicate controlling recursion.

I've added ADDR_EXPR, COND_EXPR, MAX_EXPR, MIN_EXPR, NOP_EXPR, and
POINTER_PLUS_EXPR based on the survey I did.  I don't have tests for
the rest so I've left them out for now.  If/when I come up with them
I'll add them.

+
+      if (code == GIMPLE_PHI && pmaybe)
+    {
+      unsigned count = 0;
+      gphi *phi_stmt = as_a <gphi *> (def_stmt);
+
+      unsigned nargs = gimple_phi_num_args (phi_stmt);
+      for (unsigned i = 0; i < nargs; ++i)
+        {
+          if (!visited->add (phi_stmt))
+        {
+          tree arg = gimple_phi_arg_def (phi_stmt, i);
+          if (is_addr_local (arg, ploc, pmaybe, visited))
+            ++count;
+        }
+        }
So imagine

p = phi (x1, x1, x2)

Where both x1 and x2 would pass is_addr_local.  I think this code would
incorrectly set pmaybe.

I suppose it would but this happens readily with or without my
patch, for example here:
But those seem like distinct issues.  The one I pointed out looks like a
pretty simple case to handle.  It's just a logic error.

It wasn't a logic error.  I simply didn't think it was necessary
to worry about the accuracy of an inherently inaccurate warning,
and I didn't want make more intrusive changes than was called for
by my simple fix/enhancement.  I was also keeping in mind your
preference for smaller change sets.  But since you insist I went
ahead and improved things.  The warning is all the better for it,
and the changes I made exposed a number of other bugs in GCC.
At the same time, it seems that the bar for a simple bug fixes
and small enhancements should not be in excess of what's possible
by the original design.

Anyway, attached is an updated revision with the recursion limited
to just the codes you mentioned, and hopefully with an acceptable
accuracy.  The patch depends on a fix for the hash_map bug 90923
that I just posted:
   https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01105.html

Martin



    int* f (int i)
    {
      int a[2], b[2];
      int *p = i ? a : b;
      return p;
    }

We end up with two instances of "function may return address
of local variable," one for each local, because
find_implicit_erroneous_behavior only issues the "may return"
kind of warning.
This looks more like a failing of the core approach in the erroneous
path code.


I suspect that no solution will be perfect, at a minimum because
multiple return statements sometimes get merged into one, so what
in the source code is a single return that definitely returns
the address a local may end up merged with one that returns
a null (and triggering a maybe kind of warning).  I have also
seen warnings in some non-trivial test cases that suggest that
some return statements get split up into two where a "may return"
could turn into a "definitely does return."
Certainly.  I don't expect any solution will be perfect here, but I
think fixing the logic error in the noted loop would help


It also seems to me that you can break the loop as soon as you've got a
nonzero count and get a false return from is_addr_local.   Not sure if
that'll matter in practice or not.

This is only possible if we're willing to lose some detail (i.e.,
if we are willing to only point to one variable when returning
the address of two or more).  I don't find that very user-friendly.
ACK.  Ignore that comment then.


Jeff



Reply via email to