On Tue, Nov 12, 2019 at 6:55 PM Jeff Law <l...@redhat.com> wrote: > > On 11/12/19 1:15 AM, Richard Biener wrote: > > On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <l...@redhat.com> wrote: > >> > >> On 11/6/19 3:34 PM, Martin Sebor wrote: > >>> On 11/6/19 2:06 PM, Martin Sebor wrote: > >>>> On 11/6/19 1:39 PM, Jeff Law wrote: > >>>>> On 11/6/19 1:27 PM, Martin Sebor wrote: > >>>>>> On 11/6/19 11:55 AM, Jeff Law wrote: > >>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote: > >>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte > >>>>>>>> stores mention the amount of data being stored and the amount of > >>>>>>>> space remaining in the destination, such as: > >>>>>>>> > >>>>>>>> warning: writing 4 bytes into a region of size 0 > >>>>>>>> [-Wstringop-overflow=] > >>>>>>>> > >>>>>>>> 123 | *p = 0; > >>>>>>>> | ~~~^~~ > >>>>>>>> note: destination object declared here > >>>>>>>> 45 | char b[N]; > >>>>>>>> | ^ > >>>>>>>> > >>>>>>>> A warning like this can take some time to analyze. First, the size > >>>>>>>> of the destination isn't mentioned and may not be easy to tell from > >>>>>>>> the sources. In the note above, when N's value is the result of > >>>>>>>> some non-trivial computation, chasing it down may be a small project > >>>>>>>> in and of itself. Second, it's also not clear why the region size > >>>>>>>> is zero. It could be because the offset is exactly N, or because > >>>>>>>> it's negative, or because it's in some range greater than N. > >>>>>>>> > >>>>>>>> Mentioning both the size of the destination object and the offset > >>>>>>>> makes the existing messages clearer, are will become essential when > >>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my > >>>>>>>> follow-on patch does). > >>>>>>>> > >>>>>>>> The attached patch enhances -Wstringop-overflow to do this by > >>>>>>>> letting compute_objsize return the offset to its caller, doing > >>>>>>>> something similar in get_stridx, and adding a new function to > >>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd > >>>>>>>> like the function to replace the -Wstringop-overflow handler in > >>>>>>>> builtins.c). With the change, the note above might read something > >>>>>>>> like: > >>>>>>>> > >>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here > >>>>>>>> 45 | char b[N]; > >>>>>>>> | ^ > >>>>>>>> > >>>>>>>> Tested on x86_64-linux. > >>>>>>>> > >>>>>>>> Martin > >>>>>>>> > >>>>>>>> gcc-store-offset.diff > >>>>>>>> > >>>>>>>> gcc/ChangeLog: > >>>>>>>> > >>>>>>>> * builtins.c (compute_objsize): Add an argument and set it to > >>>>>>>> offset > >>>>>>>> into destination. > >>>>>>>> * builtins.h (compute_objsize): Add an argument. > >>>>>>>> * tree-object-size.c (addr_object_size): Add an argument and > >>>>>>>> set it > >>>>>>>> to offset into destination. > >>>>>>>> (compute_builtin_object_size): Same. > >>>>>>>> * tree-object-size.h (compute_builtin_object_size): Add an > >>>>>>>> argument. > >>>>>>>> * tree-ssa-strlen.c (get_addr_stridx): Add an argument and > >>>>>>>> set it > >>>>>>>> to offset into destination. > >>>>>>>> (maybe_warn_overflow): New function. > >>>>>>>> (handle_store): Call maybe_warn_overflow to issue warnings. > >>>>>>>> > >>>>>>>> gcc/testsuite/ChangeLog: > >>>>>>>> > >>>>>>>> * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected > >>>>>>>> messages. > >>>>>>>> * g++.dg/warn/Wstringop-overflow-3.C: Same. > >>>>>>>> * gcc.dg/Wstringop-overflow-17.c: Same. > >>>>>>>> > >>>>>>> > >>>>>>>> Index: gcc/tree-ssa-strlen.c > >>>>>>>> =================================================================== > >>>>>>>> --- gcc/tree-ssa-strlen.c (revision 277886) > >>>>>>>> +++ gcc/tree-ssa-strlen.c (working copy) > >>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct > >>>>>>>> static int get_stridx_plus_constant (strinfo *, unsigned > >>>>>>>> HOST_WIDE_INT, tree); > >>>>>>>> static void handle_builtin_stxncpy (built_in_function, > >>>>>>>> gimple_stmt_iterator *); > >>>>>>>> +/* Sets MINMAX to either the constant value or the range VAL > >>>>>>>> is in > >>>>>>>> + and returns true on success. */ > >>>>>>>> + > >>>>>>>> +static bool > >>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = > >>>>>>>> NULL) > >>>>>>>> +{ > >>>>>>>> + if (tree_fits_uhwi_p (val)) > >>>>>>>> + { > >>>>>>>> + minmax[0] = minmax[1] = wi::to_wide (val); > >>>>>>>> + return true; > >>>>>>>> + } > >>>>>>>> + > >>>>>>>> + if (TREE_CODE (val) != SSA_NAME) > >>>>>>>> + return false; > >>>>>>>> + > >>>>>>>> + if (rvals) > >>>>>>>> + { > >>>>>>>> + gimple *def = SSA_NAME_DEF_STMT (val); > >>>>>>>> + if (gimple_assign_single_p (def) > >>>>>>>> + && gimple_assign_rhs_code (def) == INTEGER_CST) > >>>>>>>> + { > >>>>>>>> + /* get_value_range returns [0, N] for constant > >>>>>>>> assignments. */ > >>>>>>>> + val = gimple_assign_rhs1 (def); > >>>>>>>> + minmax[0] = minmax[1] = wi::to_wide (val); > >>>>>>>> + return true; > >>>>>>>> + } > >>>>>>> Umm, something seems really off with this hunk. If the SSA_NAME is > >>>>>>> set > >>>>>>> via a simple constant assignment, then the range ought to be a > >>>>>>> singleton > >>>>>>> ie [CONST,CONST]. Is there are particular test were this is not > >>>>>>> true? > >>>>>>> > >>>>>>> The only way offhand I could see this happening is if originally > >>>>>>> the RHS > >>>>>>> wasn't a constant, but due to optimizations it either simplified > >>>>>>> into a > >>>>>>> constant or a constant was propagated into an SSA_NAME appearing on > >>>>>>> the > >>>>>>> RHS. This would have to happen between the last range analysis and > >>>>>>> the > >>>>>>> point where you're making this query. > >>>>>> > >>>>>> Yes, I think that's right. Here's an example where it happens: > >>>>>> > >>>>>> void f (void) > >>>>>> { > >>>>>> char s[] = "1234"; > >>>>>> unsigned n = strlen (s); > >>>>>> char vla[n]; // or malloc (n) > >>>>>> vla[n] = 0; // n = [4, 4] > >>>>>> ... > >>>>>> } > >>>>>> > >>>>>> The strlen call is folded to 4 but that's not propagated to > >>>>>> the access until sometime after the strlen pass is done. > >>>>> Hmm. Are we calling set_range_info in that case? That goes behind the > >>>>> back of pass instance of vr_values. If so, that might argue we want to > >>>>> be setting it in vr_values too. > >>>> > >>>> No, set_range_info is only called for ranges. In this case, > >>>> handle_builtin_strlen replaces the strlen() call with 4: > >>>> > >>>> s = "1234"; > >>>> _1 = __builtin_strlen (&s); > >>>> n_2 = (unsigned int) _1; > >>>> a.1_8 = __builtin_alloca_with_align (_1, 8); > >>>> (*a.1_8)[n_2] = 0; > >>>> > >>>> When the access is made, the __builtin_alloca_with_align call > >>>> is found as the destination and the _1 SSA_NAME is used to > >>>> get its size. We get back the range [4, 4]. > >>> > >>> By the way, I glossed over one detail. The above doesn't work > >>> exactly as is because the allocation size is the SSA_NAME _1 > >>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with > >>> the range [0, 4]; the range is [0, 4] because it was set based > >>> on the size of the argument to the strlen() call well before > >>> the strlen pass even ran). > >> Which would tend to argue that we should forward propagate the constant > >> to the uses of _1. That should expose that the RHS of the assignment to > >> n_2 is a constant as well. > >> > >> > >>> > >>> To make it work across assignments we need to propagate the strlen > >>> results down the CFG somehow. I'm hoping the on-demand VRP will > >>> do this automagically. > >> It would, but it's probably more heavyweight than we need. We just need > >> to forward propagate the discovered constant to the use points and pick > >> up any secondary opportunities that arise. > > > > Yes. And the usual way of doing this is to keep a constant-and-copy > > lattice (and for copies you'd need to track availability) and before > > optimizing > > a stmt substitute its operands with the lattice contents. > > > > forwprop has a scheme that can be followed doing a RPO walk, strlen > > does a DOM walk, there you can follow what DOM/PRE elimination do > > (for tracking copy availability - if you just track constants you can > > elide that). > I'm also note sure handling copies is all that interesting here and if > we just handle constants/invariants, then it's pretty easy. > > Whenever we replace a strlen call with a const, we put the LHS (assuming > its an SSA_NAME) of the statement on a worklist. > > We pull items off the worklist and propagate the value to the use points > and try to simplify the resulting statement. If the RHS of the use > point simplified to a constant, then put the LHS of the use statement > onto the worklist. Iterate until the list is empty. > > That would capture everything of interest I suspect and ought to be cheap.
That's the ad-hoc way, yes. To Martin: no, this shouldn't be a prerequesite > jeff > > > > I think whenever we substitute a constant or SSA_NAME for a strlen call, > we can just replace uses of the LHS of the assignment with the > const/copy. Any statements we propagate into are put on a worklist. > > We pull statements off the worklist and try to simplify their RHS. If > the RHS simplifies to a const/copy, then then we repeat the process of > propagating to the use points and > > x = <whatever>; > > If we find <whatever> collapses to a constant or copy we can just record > it in SSA_NAME_VALUE. As we walk through statements, we can propagate > > > > Richard. > > > >> jeff > >> > > >