On Mon, 29 Jan 2024, Jakub Jelinek wrote: > Hi! > > On the following testcase we emit an invalid range of [2, 1] due to > UB in the source. Older VRP code silently swapped the boundaries and > made [1, 2] range out of it, but newer code just ICEs on it. > > The reason for pdata->minlen 2 is that we see a memcpy in this case > setting both elements of the array to non-zero value, so strlen (a) > can't be smaller than 2. The reason for pdata->maxlen 1 is that in > char a[2] array without UB there can be at most 1 non-zero character > because there needs to be '\0' termination in the buffer too. > > IMHO we shouldn't create invalid ranges like that and even creating > for that case a range [1, 2] looks wrong to me, so the following patch > just doesn't set maxlen in that case to the array size - 1, matching > what will really happen at runtime when triggering such UB (strlen will > be at least 2, perhaps more or will crash). > This is what the second hunk of the patch does. > > The first hunk fixes a fortunately harmless thinko. > If the strlen pass knows the string length (i.e. get_string_length > function returns non-NULL), we take a different path, we get to this > only if all we know is that there are certain number of non-zero > characters but we don't know what it is followed with, whether further > non-zero characters or zero termination or either of that. > If we know exactly how many non-zero characters it is, such as > char a[42]; > ... > memcpy (a, "01234567890123456789", 20); > then we take an earlier if for the INTEGER_CST case and set correctly > just pdata->minlen to 20 in that case, but if we have something like > int len; > ... > if (len < 15 || len > 32) return; > memcpy (a, "0123456789012345678901234567890123456789", len); > then we have [15, 32] range for the nonzero_chars and we set pdata->minlen > correctly to 15, but incorrectly set also pdata->maxlen to 32. That is > not what the above implies, it just means that in some cases we know that > there are at least 32 non-zero characters, followed by something we don't > know. There is no guarantee that there is '\0' right after it, so it > means nothing. > The reason this is harmless, just confusing, is that the code a few lines > later fortunately overwrites this incorrect pdata->maxlen value with > something different (either array length - 1 or all ones etc.). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK Richard. > 2024-01-29 Jakub Jelinek <ja...@redhat.com> > > PR tree-optimization/110603 > * tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect > setting of pdata->maxlen to vr.upper_bound (which is unconditionally > overwritten anyway). Avoid creating invalid range with minlen > larger than maxlen. Formatting fix. > > * gcc.c-torture/compile/pr110603.c: New test. > > --- gcc/tree-ssa-strlen.cc.jj 2024-01-03 11:51:32.664715465 +0100 > +++ gcc/tree-ssa-strlen.cc 2024-01-27 13:32:25.506401969 +0100 > @@ -1228,7 +1228,6 @@ get_range_strlen_dynamic (tree src, gimp > { > tree type = vr.type (); > pdata->minlen = wide_int_to_tree (type, vr.lower_bound ()); > - pdata->maxlen = wide_int_to_tree (type, vr.upper_bound ()); > } > } > else > @@ -1253,9 +1252,21 @@ get_range_strlen_dynamic (tree src, gimp > { > ++off; /* Increment for the terminating nul. */ > tree toffset = build_int_cst (size_type_node, off); > - pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node, size, > - toffset); > - pdata->maxbound = pdata->maxlen; > + pdata->maxlen = fold_build2 (MINUS_EXPR, size_type_node, > + size, toffset); > + if (tree_int_cst_lt (pdata->maxlen, pdata->minlen)) > + /* This can happen when triggering UB, when base is an > + array which is known to be filled with at least size > + non-zero bytes. E.g. for > + char a[2]; memcpy (a, "12", sizeof a); > + We don't want to create an invalid range [2, 1] > + where 2 comes from the number of non-zero bytes and > + 1 from longest valid zero-terminated string that can > + be stored in such an array, so pick just one of > + those, pdata->minlen. See PR110603. */ > + pdata->maxlen = build_all_ones_cst (size_type_node); > + else > + pdata->maxbound = pdata->maxlen; > } > else > pdata->maxlen = build_all_ones_cst (size_type_node); > --- gcc/testsuite/gcc.c-torture/compile/pr110603.c.jj 2024-01-27 > 13:37:29.375194755 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr110603.c 2024-01-27 > 13:37:03.104558479 +0100 > @@ -0,0 +1,16 @@ > +/* PR tree-optimization/110603 */ > + > +typedef __SIZE_TYPE__ size_t; > +void *memcpy (void *, const void *, size_t); > +int snprintf (char *restrict, size_t, const char *restrict, ...); > +extern char a[2]; > +void bar (void); > + > +void > +foo (void) > +{ > + memcpy (a, "12", sizeof (a)); > + int b = snprintf (0, 0, "%s", a); > + if (b <= 3) > + bar (); > +} > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)