On 11/13/19 7:34 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 6:55 PM Jeff Law <l...@redhat.com> wrote:
>>
>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <l...@redhat.com> wrote:
>>>>
>>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>>
>>>>>>>>>> warning: writing 4 bytes into a region of size 0 
>>>>>>>>>> [-Wstringop-overflow=]
>>>>>>>>>>
>>>>>>>>>>     123 |   *p = 0;
>>>>>>>>>>         |   ~~~^~~
>>>>>>>>>> note: destination object declared here
>>>>>>>>>>      45 |   char b[N];
>>>>>>>>>>         |        ^
>>>>>>>>>>
>>>>>>>>>> A warning like this can take some time to analyze.  First, the size
>>>>>>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>>>> some non-trivial computation, chasing it down may be a small project
>>>>>>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>>
>>>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>>>> makes the existing messages clearer, are will become essential when
>>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>>> follow-on patch does).
>>>>>>>>>>
>>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>>> builtins.c).  With the change, the note above might read something
>>>>>>>>>> like:
>>>>>>>>>>
>>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>>      45 |   char b[N];
>>>>>>>>>>         |        ^
>>>>>>>>>>
>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>>
>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>
>>>>>>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
>>>>>>>>>> offset
>>>>>>>>>>      into destination.
>>>>>>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
>>>>>>>>>> set it
>>>>>>>>>>      to offset into destination.
>>>>>>>>>>      (compute_builtin_object_size): Same.
>>>>>>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>>>> argument.
>>>>>>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>>>> set it
>>>>>>>>>>      to offset into destination.
>>>>>>>>>>      (maybe_warn_overflow): New function.
>>>>>>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>>>
>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>
>>>>>>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>>>>>>> messages.
>>>>>>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>>> ===================================================================
>>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>>>> is in
>>>>>>>>>> +   and returns true on success.  */
>>>>>>>>>> +
>>>>>>>>>> +static bool
>>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>>>>>>> NULL)
>>>>>>>>>> +{
>>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>>> +    {
>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>> +      return true;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>>> +    return false;
>>>>>>>>>> +
>>>>>>>>>> +  if (rvals)
>>>>>>>>>> +    {
>>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>>> +    {
>>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>>> assignments.  */
>>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>> +      return true;
>>>>>>>>>> +    }
>>>>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>>>>>>>> set
>>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>>> singleton
>>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>>>> true?
>>>>>>>>>
>>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>>> the RHS
>>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>>> into a
>>>>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
>>>>>>>>> the
>>>>>>>>> RHS.  This would have to happen between the last range analysis and
>>>>>>>>> the
>>>>>>>>> point where you're making this query.
>>>>>>>>
>>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>>
>>>>>>>>    void f (void)
>>>>>>>>    {
>>>>>>>>      char s[] = "1234";
>>>>>>>>      unsigned n = strlen (s);
>>>>>>>>      char vla[n];   // or malloc (n)
>>>>>>>>      vla[n] = 0;    // n = [4, 4]
>>>>>>>>      ...
>>>>>>>>    }
>>>>>>>>
>>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>>> the access until sometime after the strlen pass is done.
>>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>>>>>>> back of pass instance of vr_values.  If so, that might argue we want to
>>>>>>> be setting it in vr_values too.
>>>>>>
>>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>>
>>>>>>    s = "1234";
>>>>>>    _1 = __builtin_strlen (&s);
>>>>>>    n_2 = (unsigned int) _1;
>>>>>>    a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>>    (*a.1_8)[n_2] = 0;
>>>>>>
>>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>>> get its size.  We get back the range [4, 4].
>>>>>
>>>>> By the way, I glossed over one detail.  The above doesn't work
>>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>>> on the size of the argument to the strlen() call well before
>>>>> the strlen pass even ran).
>>>> Which would tend to argue that we should forward propagate the constant
>>>> to the uses of _1.  That should expose that the RHS of the assignment to
>>>> n_2 is a constant as well.
>>>>
>>>>
>>>>>
>>>>> To make it work across assignments we need to propagate the strlen
>>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>>> do this automagically.
>>>> It would, but it's probably more heavyweight than we need.  We just need
>>>> to forward propagate the discovered constant to the use points and pick
>>>> up any secondary opportunities that arise.
>>>
>>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>>> lattice (and for copies you'd need to track availability) and before 
>>> optimizing
>>> a stmt substitute its operands with the lattice contents.
>>>
>>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>>> does a DOM walk, there you can follow what DOM/PRE elimination do
>>> (for tracking copy availability - if you just track constants you can
>>> elide that).
>> I'm also note sure handling copies is all that interesting here and if
>> we just handle constants/invariants, then it's pretty easy.
>>
>> Whenever we replace a strlen call with a const, we put the LHS (assuming
>> its an SSA_NAME) of the statement on a worklist.
>>
>> We pull items off the worklist and propagate the value to the use points
>> and try to simplify the resulting statement.  If the RHS of the use
>> point simplified to a constant, then put the LHS of the use statement
>> onto the worklist.  Iterate until the list is empty.
>>
>> That would capture everything of interest I suspect and ought to be cheap.
> 
> That's the ad-hoc way, yes.

> 
> To Martin: no, this shouldn't be a prerequesite
I think that leaves the question of what to do with that one hunk in the
kit.  I think our choices are:

1. Leave the hunk as-is.

2. Pull out the hunk and adjust tests (with xfails I'd think).


I'd tend to lean towards #2, but I don't know how badly we'd be
compromising Martin's goals.

Jeff

Reply via email to