On 10/2/18 10:37 AM, Martin Sebor wrote: > [1/4] - Introduce struct strlen_data_t into gimple-fold > > This patch introduces a new data structure to reduce the number > of arguments and overloads of the get_range_strlen API. One of > the goals of this change is to make the API safer to use for > optimization (which looks for "permissive" results to account > even for some undefined uses) vs warnings (which relies on > conservative results to avoid false positives). The patch also > adds provides explicit arguments to get_range_strlen and adds > descriptive comments to make the affected code easier to follow. > Beyond making use of the new data structure the patch makes no > observable changes. > > The changes to gcc/testsuite/gcc.dg/strlenopt-51.c fix a few > typos with no functional effect and tweak the helper macro > used by the test to make debugging slightly easier. > > gcc-99999-1.diff > > [1/4] - Introduce struct strlen_data_t into gimple-fold. > > gcc/ChangeLog: > > * builtins.c (check_access): Document argumens to a function call. > (check_strncat_sizes): Same. > (expand_builtin_strncat): Same. > * calls.c (maybe_warn_nonstring_arg): Same. > * gimple-fold.h (struct strlen_data_t): New type. > (get_range_strlen): New overload. > * gimple-fold.c (struct strlen_data_t): New type. > (get_range_strlen): Change declaration to take strlen_data_t* > argument instead of length, flexp, eltsize, and nonstr. > Adjust to use strlen_data_t members instead of other arguments. > Also set strlen_data_t::maxsize to the same value as maxlen. > (extern get_range_strlen): Define new overload. > (get_maxval_strlen): Adjust to use strlen_data_t. > * gimple-ssa-sprintf.c (get_string_length): Same. > > gcc/testsuite/ChangeLog: > gcc.dg/strlenopt-51.c: Add counter to macros and fix typos. > > diff --git a/gcc/calls.c b/gcc/calls.c > index e9660b6..11f00ad 100644 > --- a/gcc/calls.c > +++ b/gcc/calls.c > @@ -1582,7 +1585,8 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp) > { > tree arg = CALL_EXPR_ARG (exp, argno); > g if (!get_attr_nonstring_decl (arg)) > - get_range_strlen (arg, lenrng); > + get_range_strlen (arg, lenrng, /* eltsize = */ 1, > + /* strict = */ false); > } > } > /* Fall through. */ As Bernd noted, something clearly went wrong here with the patch the "g" at the beginning of the line. Hand-edited patch perhaps? Regardless I assume you'll fix this.
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > index 1e84722..8f71e9c 100644 > --- a/gcc/gimple-fold.c > +++ b/gcc/gimple-fold.c > @@ -1262,11 +1262,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator > *gsi, tree c, tree len) > } > > > -/* Obtain the minimum and maximum string length or minimum and maximum > - value of ARG in LENGTH[0] and LENGTH[1], respectively. > +/* Try to obtain the range of the lengths of the string(s) referenced > + by ARG, or the size of the largest array ARG referes to if the range > + of lengths cannot be determined, and store all in *PDATA. > If ARG is an SSA name variable, follow its use-def chains. When > - TYPE == 0, if LENGTH[1] is not equal to the length we determine or > - if we are unable to determine the length or value, return false. > + TYPE == 0, then if PDATA->MAXLEN is not equal to the determined > + length or if we are unable to determine the length or value, return > + false. > VISITED is a bitmap of visited variables. > TYPE is 0 if string length should be obtained, 1 for maximum string > length and 2 for maximum value ARG can have. So can we clarify the return value? What is the return value for types other than 0? I struggled with parsing that before this patch :-) > > static bool > -get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, > - int fuzzy, bool *flexp, unsigned eltsize, tree *nonstr) > +get_range_strlen (tree arg, bitmap *visited, int type, int fuzzy, > + strlen_data_t *pdata) So ISTM we should keep ELTSIZE as distinct parameter as its strictly an input. That in turn allows *PDATA to be strictly filled in as an output. > > > diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h > index 26e2727..0d523e7 100644 > --- a/gcc/gimple-fold.h > +++ b/gcc/gimple-fold.h > @@ -25,6 +25,53 @@ along with GCC; see the file COPYING3. If not see > extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL); > extern tree canonicalize_constructor_val (tree, tree); > extern tree get_symbol_constant_value (tree); > + > +struct strlen_data_t > +{ > + /* [MIN, MAXSIZE, MAXLEN] is a range describing the length of > + a string of possibly unknown length. For a string of known > + length the range is a constant where MIN == MAXSIZE == MAXLEN > + holds. > + For other strings, MIN is the length of the shortest string that > + can be stored in the referenced object, i.e., MIN == 0. MAXSIZE > + is the size of the largest array referenced by the expression. > + MAXLEN is the length of the longest sequence of non-zero bytes > + in memory reference by the expression. For such strings, > + MIN <= MAXSIZE <= MAXLEN holds. For example, given: Hmm, the description here is a little confusing WRT MAXSIZE vs MAXLEN. I think the distinction we're trying to make is that MAXSIZE is the maximum respecting subobject boundaries and that MAXLEN is the maximum without respecting subobject boundaries. If so, then I think we need to look for a better name than MAXSIZE and MAXLEN. > + /* When non-null, NONSTR refers to the declaration known to store > + an unterminated constant character array, as in: > + const char s[] = { 'a', 'b', 'c' }; > + It is used to diagnose uses of such arrays in functions such as > + strlen() that expect a nul-terminated string as an argument. */ > + tree nonstr; So rather than NONSTR, DECL may make more sense -- if for no other reason than you don't have to think in terms of "not a string". > + /* ELTSIZE is set to 1 for single byte character strings, and 2 or 4 > + for wide characer strings. */ > + unsigned eltsize; Bernd's suggestion that we separate the input vs output paramters may be a reasonable one -- I think this is the only in-parameter you're passing with the structure, right? And everything else is a pure output? If so we may be better off continuing to pass the element size as a separate parameter. > + /* FLEXARRAY is true if the range of the string lengths has been > + obtained from the upper bound of an array at the end of a struct. > + Such an array may hold a string that's longer than its upper bound > + due to it being used as a poor-man's flexible array member. */ > + bool flexarray; > + > + /* Set ELTSIZE and value-initialize all other members. */ > + strlen_data_t (unsigned eltbytes) > + : minlen (), maxlen (), maxsize (), nonstr (), eltsize (eltbytes), > + flexarray () { } I think if you pull ELTSIZE out and pass it as a distinct parameter, then you don't need a ctor and you can have a POD. You can then initialize with memset rather than having to individually initialize each field -- meaning it's easier to add more output fields in the future. I don't think any of the suggestions above change the behavior of the patch. Let's hold off committing though (I assume you've got a GIT topic branch where you can make these changes and update the subsequent patches independent of everything else...) jeff