On 11/13/19 7:34 AM, Richard Biener wrote: > On Tue, Nov 12, 2019 at 6:55 PM Jeff Law <l...@redhat.com> wrote: >> >> On 11/12/19 1:15 AM, Richard Biener wrote: >>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <l...@redhat.com> wrote: >>>> >>>> On 11/6/19 3:34 PM, Martin Sebor wrote: >>>>> On 11/6/19 2:06 PM, Martin Sebor wrote: >>>>>> On 11/6/19 1:39 PM, Jeff Law wrote: >>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote: >>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote: >>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote: >>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte >>>>>>>>>> stores mention the amount of data being stored and the amount of >>>>>>>>>> space remaining in the destination, such as: >>>>>>>>>> >>>>>>>>>> warning: writing 4 bytes into a region of size 0 >>>>>>>>>> [-Wstringop-overflow=] >>>>>>>>>> >>>>>>>>>> 123 | *p = 0; >>>>>>>>>> | ~~~^~~ >>>>>>>>>> note: destination object declared here >>>>>>>>>> 45 | char b[N]; >>>>>>>>>> | ^ >>>>>>>>>> >>>>>>>>>> A warning like this can take some time to analyze. First, the size >>>>>>>>>> of the destination isn't mentioned and may not be easy to tell from >>>>>>>>>> the sources. In the note above, when N's value is the result of >>>>>>>>>> some non-trivial computation, chasing it down may be a small project >>>>>>>>>> in and of itself. Second, it's also not clear why the region size >>>>>>>>>> is zero. It could be because the offset is exactly N, or because >>>>>>>>>> it's negative, or because it's in some range greater than N. >>>>>>>>>> >>>>>>>>>> Mentioning both the size of the destination object and the offset >>>>>>>>>> makes the existing messages clearer, are will become essential when >>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my >>>>>>>>>> follow-on patch does). >>>>>>>>>> >>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by >>>>>>>>>> letting compute_objsize return the offset to its caller, doing >>>>>>>>>> something similar in get_stridx, and adding a new function to >>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd >>>>>>>>>> like the function to replace the -Wstringop-overflow handler in >>>>>>>>>> builtins.c). With the change, the note above might read something >>>>>>>>>> like: >>>>>>>>>> >>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here >>>>>>>>>> 45 | char b[N]; >>>>>>>>>> | ^ >>>>>>>>>> >>>>>>>>>> Tested on x86_64-linux. >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> gcc-store-offset.diff >>>>>>>>>> >>>>>>>>>> gcc/ChangeLog: >>>>>>>>>> >>>>>>>>>> * builtins.c (compute_objsize): Add an argument and set it to >>>>>>>>>> offset >>>>>>>>>> into destination. >>>>>>>>>> * builtins.h (compute_objsize): Add an argument. >>>>>>>>>> * tree-object-size.c (addr_object_size): Add an argument and >>>>>>>>>> set it >>>>>>>>>> to offset into destination. >>>>>>>>>> (compute_builtin_object_size): Same. >>>>>>>>>> * tree-object-size.h (compute_builtin_object_size): Add an >>>>>>>>>> argument. >>>>>>>>>> * tree-ssa-strlen.c (get_addr_stridx): Add an argument and >>>>>>>>>> set it >>>>>>>>>> to offset into destination. >>>>>>>>>> (maybe_warn_overflow): New function. >>>>>>>>>> (handle_store): Call maybe_warn_overflow to issue warnings. >>>>>>>>>> >>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>> >>>>>>>>>> * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected >>>>>>>>>> messages. >>>>>>>>>> * g++.dg/warn/Wstringop-overflow-3.C: Same. >>>>>>>>>> * gcc.dg/Wstringop-overflow-17.c: Same. >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Index: gcc/tree-ssa-strlen.c >>>>>>>>>> =================================================================== >>>>>>>>>> --- gcc/tree-ssa-strlen.c (revision 277886) >>>>>>>>>> +++ gcc/tree-ssa-strlen.c (working copy) >>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct >>>>>>>>>> static int get_stridx_plus_constant (strinfo *, unsigned >>>>>>>>>> HOST_WIDE_INT, tree); >>>>>>>>>> static void handle_builtin_stxncpy (built_in_function, >>>>>>>>>> gimple_stmt_iterator *); >>>>>>>>>> +/* Sets MINMAX to either the constant value or the range VAL >>>>>>>>>> is in >>>>>>>>>> + and returns true on success. */ >>>>>>>>>> + >>>>>>>>>> +static bool >>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = >>>>>>>>>> NULL) >>>>>>>>>> +{ >>>>>>>>>> + if (tree_fits_uhwi_p (val)) >>>>>>>>>> + { >>>>>>>>>> + minmax[0] = minmax[1] = wi::to_wide (val); >>>>>>>>>> + return true; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + if (TREE_CODE (val) != SSA_NAME) >>>>>>>>>> + return false; >>>>>>>>>> + >>>>>>>>>> + if (rvals) >>>>>>>>>> + { >>>>>>>>>> + gimple *def = SSA_NAME_DEF_STMT (val); >>>>>>>>>> + if (gimple_assign_single_p (def) >>>>>>>>>> + && gimple_assign_rhs_code (def) == INTEGER_CST) >>>>>>>>>> + { >>>>>>>>>> + /* get_value_range returns [0, N] for constant >>>>>>>>>> assignments. */ >>>>>>>>>> + val = gimple_assign_rhs1 (def); >>>>>>>>>> + minmax[0] = minmax[1] = wi::to_wide (val); >>>>>>>>>> + return true; >>>>>>>>>> + } >>>>>>>>> Umm, something seems really off with this hunk. If the SSA_NAME is >>>>>>>>> set >>>>>>>>> via a simple constant assignment, then the range ought to be a >>>>>>>>> singleton >>>>>>>>> ie [CONST,CONST]. Is there are particular test were this is not >>>>>>>>> true? >>>>>>>>> >>>>>>>>> The only way offhand I could see this happening is if originally >>>>>>>>> the RHS >>>>>>>>> wasn't a constant, but due to optimizations it either simplified >>>>>>>>> into a >>>>>>>>> constant or a constant was propagated into an SSA_NAME appearing on >>>>>>>>> the >>>>>>>>> RHS. This would have to happen between the last range analysis and >>>>>>>>> the >>>>>>>>> point where you're making this query. >>>>>>>> >>>>>>>> Yes, I think that's right. Here's an example where it happens: >>>>>>>> >>>>>>>> void f (void) >>>>>>>> { >>>>>>>> char s[] = "1234"; >>>>>>>> unsigned n = strlen (s); >>>>>>>> char vla[n]; // or malloc (n) >>>>>>>> vla[n] = 0; // n = [4, 4] >>>>>>>> ... >>>>>>>> } >>>>>>>> >>>>>>>> The strlen call is folded to 4 but that's not propagated to >>>>>>>> the access until sometime after the strlen pass is done. >>>>>>> Hmm. Are we calling set_range_info in that case? That goes behind the >>>>>>> back of pass instance of vr_values. If so, that might argue we want to >>>>>>> be setting it in vr_values too. >>>>>> >>>>>> No, set_range_info is only called for ranges. In this case, >>>>>> handle_builtin_strlen replaces the strlen() call with 4: >>>>>> >>>>>> s = "1234"; >>>>>> _1 = __builtin_strlen (&s); >>>>>> n_2 = (unsigned int) _1; >>>>>> a.1_8 = __builtin_alloca_with_align (_1, 8); >>>>>> (*a.1_8)[n_2] = 0; >>>>>> >>>>>> When the access is made, the __builtin_alloca_with_align call >>>>>> is found as the destination and the _1 SSA_NAME is used to >>>>>> get its size. We get back the range [4, 4]. >>>>> >>>>> By the way, I glossed over one detail. The above doesn't work >>>>> exactly as is because the allocation size is the SSA_NAME _1 >>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with >>>>> the range [0, 4]; the range is [0, 4] because it was set based >>>>> on the size of the argument to the strlen() call well before >>>>> the strlen pass even ran). >>>> Which would tend to argue that we should forward propagate the constant >>>> to the uses of _1. That should expose that the RHS of the assignment to >>>> n_2 is a constant as well. >>>> >>>> >>>>> >>>>> To make it work across assignments we need to propagate the strlen >>>>> results down the CFG somehow. I'm hoping the on-demand VRP will >>>>> do this automagically. >>>> It would, but it's probably more heavyweight than we need. We just need >>>> to forward propagate the discovered constant to the use points and pick >>>> up any secondary opportunities that arise. >>> >>> Yes. And the usual way of doing this is to keep a constant-and-copy >>> lattice (and for copies you'd need to track availability) and before >>> optimizing >>> a stmt substitute its operands with the lattice contents. >>> >>> forwprop has a scheme that can be followed doing a RPO walk, strlen >>> does a DOM walk, there you can follow what DOM/PRE elimination do >>> (for tracking copy availability - if you just track constants you can >>> elide that). >> I'm also note sure handling copies is all that interesting here and if >> we just handle constants/invariants, then it's pretty easy. >> >> Whenever we replace a strlen call with a const, we put the LHS (assuming >> its an SSA_NAME) of the statement on a worklist. >> >> We pull items off the worklist and propagate the value to the use points >> and try to simplify the resulting statement. If the RHS of the use >> point simplified to a constant, then put the LHS of the use statement >> onto the worklist. Iterate until the list is empty. >> >> That would capture everything of interest I suspect and ought to be cheap. > > That's the ad-hoc way, yes.
> > To Martin: no, this shouldn't be a prerequesite I think that leaves the question of what to do with that one hunk in the kit. I think our choices are: 1. Leave the hunk as-is. 2. Pull out the hunk and adjust tests (with xfails I'd think). I'd tend to lean towards #2, but I don't know how badly we'd be compromising Martin's goals. Jeff