On Mon, 29 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase we emit an invalid range of [2, 1] due to
> UB in the source.  Older VRP code silently swapped the boundaries and
> made [1, 2] range out of it, but newer code just ICEs on it.
> 
> The reason for pdata->minlen 2 is that we see a memcpy in this case
> setting both elements of the array to non-zero value, so strlen (a)
> can't be smaller than 2.  The reason for pdata->maxlen 1 is that in
> char a[2] array without UB there can be at most 1 non-zero character
> because there needs to be '\0' termination in the buffer too.
> 
> IMHO we shouldn't create invalid ranges like that and even creating
> for that case a range [1, 2] looks wrong to me, so the following patch
> just doesn't set maxlen in that case to the array size - 1, matching
> what will really happen at runtime when triggering such UB (strlen will
> be at least 2, perhaps more or will crash).
> This is what the second hunk of the patch does.
> 
> The first hunk fixes a fortunately harmless thinko.
> If the strlen pass knows the string length (i.e. get_string_length
> function returns non-NULL), we take a different path, we get to this
> only if all we know is that there are certain number of non-zero
> characters but we don't know what it is followed with, whether further
> non-zero characters or zero termination or either of that.
> If we know exactly how many non-zero characters it is, such as
> char a[42];
> ...
>   memcpy (a, "01234567890123456789", 20);
> then we take an earlier if for the INTEGER_CST case and set correctly
> just pdata->minlen to 20 in that case, but if we have something like
>   int len;
>   ...
>   if (len < 15 || len > 32) return;
>   memcpy (a, "0123456789012345678901234567890123456789", len);
> then we have [15, 32] range for the nonzero_chars and we set pdata->minlen
> correctly to 15, but incorrectly set also pdata->maxlen to 32.  That is
> not what the above implies, it just means that in some cases we know that
> there are at least 32 non-zero characters, followed by something we don't
> know.  There is no guarantee that there is '\0' right after it, so it
> means nothing.
> The reason this is harmless, just confusing, is that the code a few lines
> later fortunately overwrites this incorrect pdata->maxlen value with
> something different (either array length - 1 or all ones etc.).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

Richard.

> 2024-01-29  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR tree-optimization/110603
>       * tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect
>       setting of pdata->maxlen to vr.upper_bound (which is unconditionally
>       overwritten anyway).  Avoid creating invalid range with minlen
>       larger than maxlen.  Formatting fix.
> 
>       * gcc.c-torture/compile/pr110603.c: New test.
> 
> --- gcc/tree-ssa-strlen.cc.jj 2024-01-03 11:51:32.664715465 +0100
> +++ gcc/tree-ssa-strlen.cc    2024-01-27 13:32:25.506401969 +0100
> @@ -1228,7 +1228,6 @@ get_range_strlen_dynamic (tree src, gimp
>               {
>                 tree type = vr.type ();
>                 pdata->minlen = wide_int_to_tree (type, vr.lower_bound ());
> -               pdata->maxlen = wide_int_to_tree (type, vr.upper_bound ());
>               }
>           }
>         else
> @@ -1253,9 +1252,21 @@ get_range_strlen_dynamic (tree src, gimp
>               {
>                 ++off;   /* Increment for the terminating nul.  */
>                 tree toffset = build_int_cst (size_type_node, off);
> -               pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node, size,
> -                                            toffset);
> -               pdata->maxbound = pdata->maxlen;
> +               pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node,
> +                                            size, toffset);
> +               if (tree_int_cst_lt (pdata->maxlen, pdata->minlen))
> +                 /* This can happen when triggering UB, when base is an
> +                    array which is known to be filled with at least size
> +                    non-zero bytes.  E.g. for
> +                    char a[2]; memcpy (a, "12", sizeof a);
> +                    We don't want to create an invalid range [2, 1]
> +                    where 2 comes from the number of non-zero bytes and
> +                    1 from longest valid zero-terminated string that can
> +                    be stored in such an array, so pick just one of
> +                    those, pdata->minlen.  See PR110603.  */
> +                 pdata->maxlen = build_all_ones_cst (size_type_node);
> +               else
> +                 pdata->maxbound = pdata->maxlen;
>               }
>             else      
>               pdata->maxlen = build_all_ones_cst (size_type_node);
> --- gcc/testsuite/gcc.c-torture/compile/pr110603.c.jj 2024-01-27 
> 13:37:29.375194755 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr110603.c    2024-01-27 
> 13:37:03.104558479 +0100
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/110603 */
> +
> +typedef __SIZE_TYPE__ size_t;
> +void *memcpy (void *, const void *, size_t);
> +int snprintf (char *restrict, size_t, const char *restrict, ...);
> +extern char a[2];
> +void bar (void);
> +
> +void
> +foo (void)
> +{
> +  memcpy (a, "12", sizeof (a));
> +  int b = snprintf (0, 0, "%s", a);
> +  if (b <= 3)
> +    bar ();
> +}
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to