On 10/2/18 10:37 AM, Martin Sebor wrote:
> [1/4] - Introduce struct strlen_data_t into gimple-fold
> 
> This patch introduces a new data structure to reduce the number
> of arguments and overloads of the get_range_strlen API.  One of
> the goals of this change is to make the API safer to use for
> optimization (which looks for "permissive" results to account
> even for some undefined uses) vs warnings (which relies on
> conservative results to avoid false positives).  The patch also
> adds provides explicit arguments to get_range_strlen and adds
> descriptive comments to make the affected code easier to follow.
> Beyond making use of the new data structure the patch makes no
> observable changes.
> 
> The changes to gcc/testsuite/gcc.dg/strlenopt-51.c fix a few
> typos with no functional effect and tweak the helper macro
> used by the test to make debugging slightly easier.
> 
> gcc-99999-1.diff
> 
> [1/4] - Introduce struct strlen_data_t into gimple-fold.
> 
> gcc/ChangeLog:
> 
>       * builtins.c (check_access): Document argumens to a function call.
>       (check_strncat_sizes): Same.
>       (expand_builtin_strncat): Same.
>       * calls.c (maybe_warn_nonstring_arg): Same.
>       * gimple-fold.h (struct strlen_data_t): New type.
>       (get_range_strlen): New overload.
>       * gimple-fold.c (struct strlen_data_t): New type.
>       (get_range_strlen): Change declaration to take strlen_data_t*
>       argument instead of length, flexp, eltsize, and nonstr.
>       Adjust to use strlen_data_t members instead of other arguments.
>       Also set strlen_data_t::maxsize to the same value as maxlen.
>       (extern get_range_strlen): Define new overload.
>       (get_maxval_strlen): Adjust to use strlen_data_t.
>       * gimple-ssa-sprintf.c (get_string_length): Same.
> 
> gcc/testsuite/ChangeLog:
>       gcc.dg/strlenopt-51.c: Add counter to macros and fix typos.
> 
> diff --git a/gcc/calls.c b/gcc/calls.c
> index e9660b6..11f00ad 100644
> --- a/gcc/calls.c
> +++ b/gcc/calls.c
> @@ -1582,7 +1585,8 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
>         {
>           tree arg = CALL_EXPR_ARG (exp, argno);
> g         if (!get_attr_nonstring_decl (arg))
> -           get_range_strlen (arg, lenrng);
> +           get_range_strlen (arg, lenrng, /* eltsize = */ 1,
> +                             /* strict = */ false);
>         }
>        }
>        /* Fall through.  */
As Bernd noted, something clearly went wrong here with the patch the "g"
at the beginning of the line.  Hand-edited patch perhaps?  Regardless I
assume you'll fix this.


> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 1e84722..8f71e9c 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -1262,11 +1262,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator 
> *gsi, tree c, tree len)
>  }
>  
>  
> -/* Obtain the minimum and maximum string length or minimum and maximum
> -   value of ARG in LENGTH[0] and LENGTH[1], respectively.
> +/* Try to obtain the range of the lengths of the string(s) referenced
> +   by ARG, or the size of the largest array ARG referes to if the range
> +   of lengths cannot be determined, and store all in *PDATA.
>     If ARG is an SSA name variable, follow its use-def chains.  When
> -   TYPE == 0, if LENGTH[1] is not equal to the length we determine or
> -   if we are unable to determine the length or value, return false.
> +   TYPE == 0, then if PDATA->MAXLEN is not equal to the determined
> +   length or if we are unable to determine the length or value, return
> +   false.
>     VISITED is a bitmap of visited variables.
>     TYPE is 0 if string length should be obtained, 1 for maximum string
>     length and 2 for maximum value ARG can have.
So can we clarify the return value?  What is the return value for types
other than 0?  I struggled with parsing that before this patch :-)


>  
>  static bool
> -get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
> -               int fuzzy, bool *flexp, unsigned eltsize, tree *nonstr)
> +get_range_strlen (tree arg, bitmap *visited, int type, int fuzzy,
> +               strlen_data_t *pdata)
So ISTM we should keep ELTSIZE as distinct parameter as its strictly an
input.  That in turn allows *PDATA to be strictly filled in as an output.



>  
>  
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 26e2727..0d523e7 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -25,6 +25,53 @@ along with GCC; see the file COPYING3.  If not see
>  extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
>  extern tree canonicalize_constructor_val (tree, tree);
>  extern tree get_symbol_constant_value (tree);
> +
> +struct strlen_data_t
> +{
> +  /* [MIN, MAXSIZE, MAXLEN] is a range describing the length of
> +     a string of possibly unknown length.  For a string of known
> +     length the range is a constant where MIN == MAXSIZE == MAXLEN
> +     holds.
> +     For other strings, MIN is the length of the shortest string that
> +     can be stored in the referenced object, i.e., MIN == 0.  MAXSIZE
> +     is the size of the largest array referenced by the expression.
> +     MAXLEN is the length of the longest sequence of non-zero bytes
> +     in memory reference by the expression.  For such strings,
> +     MIN <= MAXSIZE <= MAXLEN holds.  For example, given:
Hmm, the description here is a little confusing WRT MAXSIZE vs MAXLEN.
I think the distinction we're trying to make is that MAXSIZE is the
maximum respecting subobject boundaries and that MAXLEN is the maximum
without respecting subobject boundaries.   If so, then I think we need
to look for a better name than MAXSIZE and MAXLEN.

> +  /* When non-null, NONSTR refers to the declaration known to store
> +     an unterminated constant character array, as in:
> +     const char s[] = { 'a', 'b', 'c' };
> +     It is used to diagnose uses of such arrays in functions such as
> +     strlen() that expect a nul-terminated string as an argument.  */
> +  tree nonstr;
So rather than NONSTR, DECL may make more sense -- if for no other
reason than you don't have to think in terms of "not a string".



> +  /* ELTSIZE is set to 1 for single byte character strings, and 2 or 4
> +     for wide characer strings.  */
> +  unsigned eltsize;
Bernd's suggestion that we separate the input vs output paramters may be
a reasonable one -- I think this is the only in-parameter you're passing
with the structure, right?  And everything else is a pure output?  If so
we may be better off continuing to pass the element size as a separate
parameter.



> +  /* FLEXARRAY is true if the range of the string lengths has been
> +     obtained from the upper bound of an array at the end of a struct.
> +     Such an array may hold a string that's longer than its upper bound
> +     due to it being used as a poor-man's flexible array member.  */
> +  bool flexarray;
> +
> +  /* Set ELTSIZE and value-initialize all other members.  */
> +  strlen_data_t (unsigned eltbytes)
> +    : minlen (), maxlen (), maxsize (), nonstr (), eltsize (eltbytes),
> +      flexarray () { }
I think if you pull ELTSIZE out and pass it as a distinct parameter,
then you don't need a ctor and you can have a POD.  You can then
initialize with memset rather than having to individually initialize
each field -- meaning it's easier to add more output fields in the future.


I don't think any of the suggestions above change the behavior of the
patch.  Let's hold off committing though (I assume you've got a GIT
topic branch where you can make these changes and update the subsequent
patches independent of everything else...)

jeff

Reply via email to