On 11/12/19 1:15 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <l...@redhat.com> wrote:
>>
>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>> space remaining in the destination, such as:
>>>>>>>>
>>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>>>>>
>>>>>>>>     123 |   *p = 0;
>>>>>>>>         |   ~~~^~~
>>>>>>>> note: destination object declared here
>>>>>>>>      45 |   char b[N];
>>>>>>>>         |        ^
>>>>>>>>
>>>>>>>> A warning like this can take some time to analyze.  First, the size
>>>>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>> some non-trivial computation, chasing it down may be a small project
>>>>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>
>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>> makes the existing messages clearer, are will become essential when
>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>> follow-on patch does).
>>>>>>>>
>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>> builtins.c).  With the change, the note above might read something
>>>>>>>> like:
>>>>>>>>
>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>      45 |   char b[N];
>>>>>>>>         |        ^
>>>>>>>>
>>>>>>>> Tested on x86_64-linux.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> gcc-store-offset.diff
>>>>>>>>
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
>>>>>>>> offset
>>>>>>>>      into destination.
>>>>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
>>>>>>>> set it
>>>>>>>>      to offset into destination.
>>>>>>>>      (compute_builtin_object_size): Same.
>>>>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>> argument.
>>>>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>> set it
>>>>>>>>      to offset into destination.
>>>>>>>>      (maybe_warn_overflow): New function.
>>>>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>>>>> messages.
>>>>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>
>>>>>>>
>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>> ===================================================================
>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>> is in
>>>>>>>> +   and returns true on success.  */
>>>>>>>> +
>>>>>>>> +static bool
>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>>>>> NULL)
>>>>>>>> +{
>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>> +    {
>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>> +      return true;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>> +    return false;
>>>>>>>> +
>>>>>>>> +  if (rvals)
>>>>>>>> +    {
>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>> +    {
>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>> assignments.  */
>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>> +      return true;
>>>>>>>> +    }
>>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>>>>>> set
>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>> singleton
>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>> true?
>>>>>>>
>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>> the RHS
>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>> into a
>>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
>>>>>>> the
>>>>>>> RHS.  This would have to happen between the last range analysis and
>>>>>>> the
>>>>>>> point where you're making this query.
>>>>>>
>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>
>>>>>>    void f (void)
>>>>>>    {
>>>>>>      char s[] = "1234";
>>>>>>      unsigned n = strlen (s);
>>>>>>      char vla[n];   // or malloc (n)
>>>>>>      vla[n] = 0;    // n = [4, 4]
>>>>>>      ...
>>>>>>    }
>>>>>>
>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>> the access until sometime after the strlen pass is done.
>>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>>>>> back of pass instance of vr_values.  If so, that might argue we want to
>>>>> be setting it in vr_values too.
>>>>
>>>> No, set_range_info is only called for ranges.  In this case,
>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>
>>>>    s = "1234";
>>>>    _1 = __builtin_strlen (&s);
>>>>    n_2 = (unsigned int) _1;
>>>>    a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>    (*a.1_8)[n_2] = 0;
>>>>
>>>> When the access is made, the __builtin_alloca_with_align call
>>>> is found as the destination and the _1 SSA_NAME is used to
>>>> get its size.  We get back the range [4, 4].
>>>
>>> By the way, I glossed over one detail.  The above doesn't work
>>> exactly as is because the allocation size is the SSA_NAME _1
>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>> the range [0, 4]; the range is [0, 4] because it was set based
>>> on the size of the argument to the strlen() call well before
>>> the strlen pass even ran).
>> Which would tend to argue that we should forward propagate the constant
>> to the uses of _1.  That should expose that the RHS of the assignment to
>> n_2 is a constant as well.
>>
>>
>>>
>>> To make it work across assignments we need to propagate the strlen
>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>> do this automagically.
>> It would, but it's probably more heavyweight than we need.  We just need
>> to forward propagate the discovered constant to the use points and pick
>> up any secondary opportunities that arise.
> 
> Yes.  And the usual way of doing this is to keep a constant-and-copy
> lattice (and for copies you'd need to track availability) and before 
> optimizing
> a stmt substitute its operands with the lattice contents.
> 
> forwprop has a scheme that can be followed doing a RPO walk, strlen
> does a DOM walk, there you can follow what DOM/PRE elimination do
> (for tracking copy availability - if you just track constants you can
> elide that).
I'm also note sure handling copies is all that interesting here and if
we just handle constants/invariants, then it's pretty easy.

Whenever we replace a strlen call with a const, we put the LHS (assuming
its an SSA_NAME) of the statement on a worklist.

We pull items off the worklist and propagate the value to the use points
and try to simplify the resulting statement.  If the RHS of the use
point simplified to a constant, then put the LHS of the use statement
onto the worklist.  Iterate until the list is empty.

That would capture everything of interest I suspect and ought to be cheap.

jeff



I think whenever we substitute a constant or SSA_NAME for a strlen call,
we can just replace uses of the LHS of the assignment with the
const/copy.  Any statements we propagate into are put on a worklist.

We pull statements off the worklist and try to simplify their RHS.  If
the RHS simplifies to a const/copy, then then we repeat the process of
propagating to the use points and

x = <whatever>;

If we find <whatever> collapses to a constant or copy we can just record
it in SSA_NAME_VALUE.  As we walk through statements, we can propagate
> 
> Richard.
> 
>> jeff
>>
> 

Reply via email to