On 08/02/18 04:44, Martin Sebor wrote: > Since the foundation of the patch is detecting and avoiding > the overly aggressive folding of unterminated char arrays, > besides issuing a warning for such arguments to strlen, > the patch also fixes pr86711 - wrong folding of memchr, and > pr86714 - tree-ssa-forwprop.c confused by too long initializer. > > The substance of the attached updated patch is unchanged, > I have just added test cases for the two additional bugs. > > Bernd, as I mentioned Wednesday, the patch supersedes > yours here: > https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html >
No problem, but I hope you understand, that I still uphold my patch. So we have two patches now: - mine, fixing a wrong code bug, - yours, implementing a new warning and fixing a wrong code bug at the same time. I will add a few comments to your patch below. > Martin > > On 07/30/2018 01:17 PM, Martin Sebor wrote: >> Attached is an updated version of the patch that handles more >> instances of calling strlen() on a constant array that is not >> a nul-terminated string. >> >> No other functions except strlen are explicitly handled yet, >> and neither are constant arrays with braced-initializer lists >> like const char a[] = { 'a', 'b', 'c' }; I am testing >> an independent solution for those (bug 86552). Once those >> are handled the warning will be able to detect those as well. >> >> Tested on x86_64-linux. >> >> On 07/25/2018 05:38 PM, Martin Sebor wrote: >>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html >>> >>> The fix for bug 86532 has been checked in so this enhancement >>> can now be applied on top of it (with only minor adjustments). >>> >>> On 07/19/2018 02:08 PM, Martin Sebor wrote: >>>> In the discussion of my patch for pr86532 Bernd noted that >>>> GCC silently accepts constant character arrays with no >>>> terminating nul as arguments to strlen (and other string >>>> functions). >>>> >>>> The attached patch is a first step in detecting these kinds >>>> of bugs in strlen calls by issuing -Wstringop-overflow. >>>> The next step is to modify all other handlers of built-in >>>> functions to detect the same problem (not part of this patch). >>>> Yet another step is to detect these problems in arguments >>>> initialized using the non-string form: >>>> >>>> const char a[] = { 'a', 'b', 'c' }; >>>> >>>> This patch is meant to apply on top of the one for bug 86532 >>>> (I tested it with an earlier version of that patch so there >>>> is code in the context that does not appear in the latest >>>> version of the other diff). >>>> >>>> Martin >>>> >>> >> > > PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long > initializer > PR tree-optimization/86711 - wrong folding of memchr > PR tree-optimization/86552 - missing warning for reading past the end of > non-string arrays > > gcc/ChangeLog: > > PR tree-optimization/86714 > PR tree-optimization/86711 > PR tree-optimization/86552 > * builtins.h (warn_string_no_nul): Declare.. > (c_strlen): Add argument. > * builtins.c (warn_string_no_nul): New function. > (fold_builtin_strlen): Add argument. Detect missing nul. > (fold_builtin_1): Adjust. > (string_length): Add argument and use it. > (c_strlen): Same. > (expand_builtin_strlen): Detect missing nul. > * expr.c (string_constant): Add arguments. Detect missing nul > terminator and outermost declaration it's missing in. > * expr.h (string_constant): Add argument. > * fold-const.c (c_getstr): Change argument to bool*, rename > other arguments. > * fold-const-call.c (fold_const_call): Detect missing nul. > * gimple-fold.c (get_range_strlen): Add argument. > (get_maxval_strlen): Adjust. > * gimple-fold.h (get_range_strlen): Add argument. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/86714 > PR tree-optimization/86711 > PR tree-optimization/86552 > * gcc.c-torture/execute/memchr-1.c: New test. > * gcc.c-torture/execute/pr86714.c: New test. > * gcc.dg/warn-strlen-no-nul.c: New test. > > diff --git a/gcc/builtins.c b/gcc/builtins.c > index aa3e0d8..f4924d5 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int); > static rtx expand_builtin_expect (tree, rtx); > static tree fold_builtin_constant_p (tree); > static tree fold_builtin_classify_type (tree); > -static tree fold_builtin_strlen (location_t, tree, tree); > +static tree fold_builtin_strlen (location_t, tree, tree, tree); > static tree fold_builtin_inf (location_t, tree, int); > static tree rewrite_call_expr (location_t, tree, int, tree, int, ...); > static bool validate_arg (const_tree, enum tree_code code); > @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, > unsigned maxelts) > return n; > } > > +/* For a call expression EXP to a function that expects a string argument, > + issue a diagnostic due to it being a called with an argument NONSTR > + that is a character array with no terminating NUL. */ > + > +void > +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr) > +{ > + loc = expansion_point_location_if_in_system_header (loc); > + > + bool warned; > + if (exp) > + { > + if (!fndecl) > + fndecl = get_callee_fndecl (exp); > + warned = warning_at (loc, OPT_Wstringop_overflow_, > + "%K%qD argument missing terminating nul", > + exp, fndecl); > + } > + else > + { > + gcc_assert (fndecl); > + warned = warning_at (loc, OPT_Wstringop_overflow_, > + "%qD argument missing terminating nul", > + fndecl); > + } > + > + if (warned && DECL_P (nonstr)) > + inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared > here"); > +} > + > /* Compute the length of a null-terminated character string or wide > character string handling character sizes of 1, 2, and 4 bytes. > TREE_STRING_LENGTH is not the right way because it evaluates to > @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, > unsigned maxelts) > accesses. Note that this implies the result is not going to be emitted > into the instruction stream. > > + When ARR is non-null and the string is not properly nul-terminated, > + set *ARR to the declaration of the outermost constant object whose > + initializer (or one of its elements) is not nul-terminated. > + > The value returned is of type `ssizetype'. > > Unfortunately, string_constant can't access the values of const char > arrays with initializers, so neither can we do so here. */ > > tree > -c_strlen (tree src, int only_value) > +c_strlen (tree src, int only_value, tree *arr /* = NULL */) > { > STRIP_NOPS (src); > + > + /* Used to detect non-nul-terminated strings in subexpressions > + of a conditional expression. When ARR is null, point it at > + one of the elements for simplicity. */ > + tree arrs[] = { NULL_TREE, NULL_TREE }; > + if (!arr) > + arr = arrs; now arr is always != NULL > + > if (TREE_CODE (src) == COND_EXPR > && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0)))) > { > - tree len1, len2; > - > - len1 = c_strlen (TREE_OPERAND (src, 1), only_value); > - len2 = c_strlen (TREE_OPERAND (src, 2), only_value); > + tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs); > + tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1); > if (tree_int_cst_equal (len1, len2)) > - return len1; > + { funny, if called with NULL *arr and arrs[0] alias each other. > + *arr = arrs[0] ? arrs[0] : arrs[1]; > + return len1; > + } > } > > if (TREE_CODE (src) == COMPOUND_EXPR > && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0)))) > - return c_strlen (TREE_OPERAND (src, 1), only_value); > + return c_strlen (TREE_OPERAND (src, 1), only_value, arr); > > location_t loc = EXPR_LOC_OR_LOC (src, input_location); > > /* Offset from the beginning of the string in bytes. */ > tree byteoff; > - src = string_constant (src, &byteoff); > - if (src == 0) > - return NULL_TREE; > + /* Set if array is nul-terminated, false otherwise. */ > + bool nulterm; note arr is always != null or pointing to arrs[0]. > + src = string_constant (src, &byteoff, &nulterm, arr); > + if (!src) > + { > + *arr = arrs[0] ? arrs[0] : arrs[1]; > + return NULL_TREE; > + } > + > + /* Clear *ARR when the string is nul-terminated. It should be > + of no interest to callers. */ > + if (nulterm) > + *arr = NULL_TREE; > > /* Determine the size of the string element. */ > unsigned eltsize > @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value) > maxelts = maxelts / eltsize - 1; > } > > + /* Unless the caller is prepared to handle it by passing in a non-null > + ARR, fail if the terminating nul doesn't fit in the array the string > + is stored in (as in const char a[3] = "123"; */ note arr is always != NULL, thus this if is never taken. > + if (!arr && maxelts < strelts) > + return NULL_TREE; > + > /* PTR can point to the byte representation of any string type, including > char* and wchar_t*. */ > const char *ptr = TREE_STRING_POINTER (src); > @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value) > offsave = fold_convert (ssizetype, offsave); > tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, > offsave, > build_int_cst (ssizetype, len * eltsize)); this computation is wrong, it computes not in units of eltsize, I am however not sure if it is really good that this function tries to compute strlen of wide character strings. That said, please fix this computation first, in a different patch instead of just fixing the indentation. (I know I pointed that lines are too long here, but that was before I realized that the whole length computation here is wrong). > - tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), > offsave); > + tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), > + offsave); > return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp, > build_zero_cst (ssizetype)); > } > @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value) > Since ELTOFF is our starting index into the string, no further > calculation is needed. */ What are you fixing here, I think that was another bug. If this fixes something then it should be in a different patch, just handling this. > unsigned len = string_length (ptr + eltoff * eltsize, eltsize, > - maxelts - eltoff); > + strelts - eltoff); > > return ssize_int (len); > } > @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target, > > struct expand_operand ops[4]; > rtx pat; > - tree len; > tree src = CALL_EXPR_ARG (exp, 0); > rtx src_reg; > rtx_insn *before_strlen; > @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target, > unsigned int align; > > /* If the length can be computed at compile-time, return it. */ > - len = c_strlen (src, 0); > + tree array; > + tree len = c_strlen (src, 0, &array); You know the c_strlen tries to compute wide character sizes, but strlen does not do that, strlen (L"abc") should give 1 (or 0 on a BE machine) I wonder if that is correct. > if (len) > - return expand_expr (len, target, target_mode, EXPAND_NORMAL); > + { > + if (array) > + { > + /* Array refers to the non-nul terminated constant array > + whose length is attempted to be computed. */ I really wonder if it would not make more sense to have a nonterminated_string_constant_p instead. Last time I wanted to implement a warning in expand I faced the problem that inlined functions will get one warning per invocation? > + warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array); > + return NULL_RTX; > + } > + return expand_expr (len, target, target_mode, EXPAND_NORMAL); > + } > > /* If the length can be computed at compile-time and is constant > integer, but there are side-effects in src, evaluate > src for side-effects, then return len. > E.g. x = strlen (i++ ? "xfoo" + 1 : "bar"); > can be optimized into: i++; x = 3; */ > - len = c_strlen (src, 1); > - if (len && TREE_CODE (len) == INTEGER_CST) > + len = c_strlen (src, 1, &array); > + if (len) > { > - expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL); > - return expand_expr (len, target, target_mode, EXPAND_NORMAL); > + if (array) > + { > + warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array); > + return NULL_RTX; > + } > + > + if (TREE_CODE (len) == INTEGER_CST) > + { > + expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL); > + return expand_expr (len, target, target_mode, EXPAND_NORMAL); > + } > } > > align = get_pointer_alignment (src) / BITS_PER_UNIT; > @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg) > return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg))); > } > > -/* Fold a call to __builtin_strlen with argument ARG. */ > +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG. */ > > static tree > -fold_builtin_strlen (location_t loc, tree type, tree arg) > +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg) > { > if (!validate_arg (arg, POINTER_TYPE)) > return NULL_TREE; > else > { > - tree len = c_strlen (arg, 0); > - > + tree arr = NULL_TREE; > + tree len = c_strlen (arg, 0, &arr); Is it possible to write a test case where strlen(L"test") reaches this point? what will c_strlen return then? > if (len) > - return fold_convert_loc (loc, type, len); > + { > + if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg)) > + loc = EXPR_LOCATION (arg); > + > + /* To avoid warning multiple times about non-nul-terminated > + strings only warn if their length has been determined > + and it's being folded. */ > + if (arr) > + warn_string_no_nul (loc, NULL_TREE, fndecl, arr); > + > + return fold_convert_loc (loc, type, len); > + } > > return NULL_TREE; > } > @@ -9175,7 +9264,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0) > return fold_builtin_classify_type (arg0); > > case BUILT_IN_STRLEN: > - return fold_builtin_strlen (loc, type, arg0); > + return fold_builtin_strlen (loc, fndecl, type, arg0); > > CASE_FLT_FN (BUILT_IN_FABS): > CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS): > diff --git a/gcc/builtins.h b/gcc/builtins.h > index 2e0a2f9..73b0b0b 100644 > --- a/gcc/builtins.h > +++ b/gcc/builtins.h > @@ -58,7 +58,7 @@ extern bool get_pointer_alignment_1 (tree, unsigned int *, > unsigned HOST_WIDE_INT *); > extern unsigned int get_pointer_alignment (tree); > extern unsigned string_length (const void*, unsigned, unsigned); > -extern tree c_strlen (tree, int); > +extern tree c_strlen (tree, int, tree * = NULL); > extern void expand_builtin_setjmp_setup (rtx, rtx); > extern void expand_builtin_setjmp_receiver (rtx); > extern void expand_builtin_update_setjmp_buf (rtx); > @@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p); > > extern internal_fn associated_internal_fn (tree); > extern internal_fn replacement_internal_fn (gcall *); > - > +extern void warn_string_no_nul (location_t, tree, tree, tree); > extern tree max_object_size (); > > #endif /* GCC_BUILTINS_H */ > diff --git a/gcc/expr.c b/gcc/expr.c > index de6709d..edbd7f8 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -11271,10 +11271,14 @@ is_aligning_offset (const_tree offset, const_tree > exp) > /* Return the tree node if an ARG corresponds to a string constant or zero > if it doesn't. If we return nonzero, set *PTR_OFFSET to the (possibly > non-constant) offset in bytes within the string that ARG is accessing. > + If NULTERM is non-null, consider valid even sequences of characters that > + aren't nul-terminated strings. In that case, set NULTERM if ARG refers > + to such a sequence and clear it otherwise. > The type of the offset is sizetype. */ > > tree > -string_constant (tree arg, tree *ptr_offset) > +string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */, > + tree *decl /* = NULL */) > { > tree array; > STRIP_NOPS (arg); > @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset) > return NULL_TREE; > > tree offset; > - if (tree str = string_constant (arg0, &offset)) > + if (tree str = string_constant (arg0, &offset, nulterm, decl)) > { > /* Avoid pointers to arrays (see bug 86622). */ > if (POINTER_TYPE_P (TREE_TYPE (arg)) > @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset) > if (TREE_CODE (array) == STRING_CST) Well, actually I think there _are_ STING_CSTs which are not null terminated. Maybe not in C. But Fortran, Ada, Go... > { > *ptr_offset = fold_convert (sizetype, offset); > + if (nulterm) > + *nulterm = true; > + if (decl) > + *decl = NULL_TREE; > return array; > } > > @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset) > if (!array_size || TREE_CODE (array_size) != INTEGER_CST) > return NULL_TREE; > > + unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size); > + I don't understand why this is necessary at all. It looks way too complicated, to say the least. TREE_TYPE (init) has already the type of the member. > + /* When ARG refers to an aggregate (of arrays) try to determine > + the size of the character array within the aggregate. */ > + tree ref = arg; > + tree reftype = TREE_TYPE (arg); > + > + if (TREE_CODE (ref) == MEM_REF) > + { > + ref = TREE_OPERAND (ref, 0); > + if (TREE_CODE (ref) == ADDR_EXPR) > + { > + ref = TREE_OPERAND (ref, 0); > + reftype = TREE_TYPE (ref); > + } > + } > + else > + while (TREE_CODE (ref) == ARRAY_REF) > + { > + reftype = TREE_TYPE (ref); > + ref = TREE_OPERAND (ref, 0); > + } > + > + if (TREE_CODE (ref) == COMPONENT_REF) > + reftype = TREE_TYPE (ref); > + > + while (TREE_CODE (reftype) == ARRAY_TYPE) > + { > + tree next = TREE_TYPE (reftype); > + if (TREE_CODE (next) == INTEGER_TYPE) > + { > + if (tree size = TYPE_SIZE_UNIT (reftype)) > + if (tree_fits_uhwi_p (size)) > + array_elts = tree_to_uhwi (size); so array_elts is measued in bytes. > + break; > + } > + > + reftype = TREE_TYPE (reftype); > + } > + > + if (decl) > + *decl = array; > + > /* Avoid returning a string that doesn't fit in the array > it is stored in, like > const char a[4] = "abcde"; > @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset) > unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init); > length = string_length (TREE_STRING_POINTER (init), charsize, > length / charsize); Some callers especially those where the wrong code happens, expect to be able to access the STRING_CST up to TREE_STRING_LENGTH, But using string_length assume thy stop at the first nul char. > - if (compare_tree_int (array_size, length + 1) < 0) > + if (nulterm) but here you compare bytes with length which is measued un chars. > + *nulterm = array_elts > length; > + else if (array_elts <= length) > return NULL_TREE; > > *ptr_offset = offset; > diff --git a/gcc/expr.h b/gcc/expr.h > index cf047d4..e630979 100644 > --- a/gcc/expr.h > +++ b/gcc/expr.h > @@ -288,7 +288,7 @@ expand_normal (tree exp) > > /* Return the tree node and offset if a given argument corresponds to > a string constant. */ > -extern tree string_constant (tree, tree *); > +extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL); > > /* Two different ways of generating switch statements. */ > extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, > profile_probability); > diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c > index 06a42060..849a443 100644 > --- a/gcc/fold-const-call.c > +++ b/gcc/fold-const-call.c > @@ -1199,9 +1199,14 @@ fold_const_call (combined_fn fn, tree type, tree arg) > switch (fn) > { > case CFN_BUILT_IN_STRLEN: > - if (const char *str = c_getstr (arg)) > - return build_int_cst (type, strlen (str)); > - return NULL_TREE; > + { > + bool nulterm; > + if (const char *str = c_getstr (arg, NULL, &nulterm)) > + if (nulterm) > + return build_int_cst (type, strlen (str)); > + > + return NULL_TREE; > + } > > CASE_CFN_NAN: > CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN): > diff --git a/gcc/fold-const.c b/gcc/fold-const.c > index b318fc77..ecbc38c 100644 > --- a/gcc/fold-const.c > +++ b/gcc/fold-const.c > @@ -14577,23 +14577,23 @@ fold_build_pointer_plus_hwi_loc (location_t loc, > tree ptr, HOST_WIDE_INT off) > /* Return a pointer P to a NUL-terminated string representing the sequence > of constant characters referred to by SRC (or a subsequence of such > characters within it if SRC is a reference to a string plus some > - constant offset). If STRLEN is non-null, store stgrlen(P) in *STRLEN. > - If STRSIZE is non-null, store in *STRSIZE the size of the array > - the string is stored in; in that case, even though P points to a NUL > - terminated string, SRC need not refer to one. This can happen when > - SRC refers to a constant character array initialized to all non-NUL > - values, as in the C declaration: char a[4] = "1234"; */ > + constant offset). If STRSIZE is non-null, store the size of the string > + literal in *STRSIZE, including any embedded or terminating nuls. If > + If NULLTERM is non-null, set *NULLTERM if the referenced string is > + guaranteed to contain a terminating NUL. Otherwise clear it. This > + can happen in the case of valid C declarations such as: > + const char a[3] = "123"; */ > > const char * > -c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */, > - unsigned HOST_WIDE_INT *strsize /* = NULL */) > +c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */, > + bool *nulterm /* = NULL */) > { > tree offset_node; > > - if (strlen) > - *strlen = 0; > + if (strsize) > + *strsize = 0; > > - src = string_constant (src, &offset_node); > + src = string_constant (src, &offset_node, nulterm); > if (src == 0) > return NULL; > > @@ -14606,47 +14606,42 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen > /* = NULL */, > offset = tree_to_uhwi (offset_node); > } > > - /* STRING_LENGTH is the size of the string literal, including any > - embedded NULs. STRING_SIZE is the size of the array the string > + /* STRING_SIZE is the size of the string literal, including any > + embedded NULs. ARRAY_SIZE is the size of the array the string > literal is stored in. */ > - unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src); > - unsigned HOST_WIDE_INT string_size = string_length; > + unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src); > + unsigned HOST_WIDE_INT array_size = string_size; > tree type = TREE_TYPE (src); > if (tree size = TYPE_SIZE_UNIT (type)) > if (tree_fits_shwi_p (size)) > - string_size = tree_to_uhwi (size); > + array_size = tree_to_uhwi (size); > + > + const char *string = TREE_STRING_POINTER (src); > > - if (strlen) > + if (strsize) > { > - /* Compute and store the length of the substring at OFFSET. > + /* Compute and store the size of the substring at OFFSET. > All offsets past the initial length refer to null strings. */ > - if (offset <= string_length) > - *strlen = string_length - offset; this should be offset < string_length. > + if (offset <= string_size) > + *strsize = string_size - offset; > else > - *strlen = 0; this should be 1, you may access the NUL byte of "". > + *strsize = 0; > } > > - const char *string = TREE_STRING_POINTER (src); > - > - if (string_length == 0 > - || offset >= string_size) > + if (string_size == 0 > + || offset >= array_size) > return NULL; > > - if (strsize) > - { > - /* Support even constant character arrays that aren't proper > - NUL-terminated strings. */ > - *strsize = string_size; > - } > - else if (string[string_length - 1] != '\0') Well, this is broken for wide character strings. but I hope we can get rid of STRING_CST which are not explicitly null terminated. > + if (!nulterm && string[string_size - 1] != '\0') > { > - /* Support only properly NUL-terminated strings but handle > - consecutive strings within the same array, such as the six > - substrings in "1\0002\0003". */ > + /* When NULTERM is null, support only properly nul-terminated > + strings but handle consecutive strings within the same array, > + such as the six substrings in "1\0002\0003". Otherwise, let > + the caller deal with non-nul-terminated arrays. */ > return NULL; > } > > - return offset <= string_length ? string + offset : ""; this should be offset < string_size. > + return offset <= string_size ? string + offset : ""; > } > > /* Given a tree T, compute which bits in T may be nonzero. */ > diff --git a/gcc/fold-const.h b/gcc/fold-const.h > index 1b9ccc0..a58a4a2 100644 > --- a/gcc/fold-const.h > +++ b/gcc/fold-const.h > @@ -188,7 +188,7 @@ extern tree const_unop (enum tree_code, tree, tree); > extern tree const_binop (enum tree_code, tree, tree, tree); > extern bool negate_mathfn_p (combined_fn); > extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL, > - unsigned HOST_WIDE_INT * = NULL); > + bool * = NULL); > extern wide_int tree_nonzero_bits (const_tree); > > /* Return OFF converted to a pointer offset type suitable as offset for > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > index c3fa570..9eefb37 100644 > --- a/gcc/gimple-fold.c > +++ b/gcc/gimple-fold.c > @@ -1275,11 +1275,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator > *gsi, tree c, tree len) > Set *FLEXP to true if the range of the string lengths has been > obtained from the upper bound of an array at the end of a struct. > Such an array may hold a string that's longer than its upper bound > - due to it being used as a poor-man's flexible array member. */ > + due to it being used as a poor-man's flexible array member. > + Clear *NULTERM if ARG refers to a constant array that is known > + not be nul-terminated. */ > > static bool > get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, > - int fuzzy, bool *flexp) > + int fuzzy, bool *flexp, bool *nulterm) > { > tree var, val = NULL_TREE; > gimple *def_stmt; > @@ -1301,7 +1303,8 @@ get_range_strlen (tree arg, tree length[2], bitmap > *visited, int type, > if (TREE_CODE (aop0) == INDIRECT_REF > && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME) > return get_range_strlen (TREE_OPERAND (aop0, 0), > - length, visited, type, fuzzy, flexp); > + length, visited, type, fuzzy, flexp, > + nulterm); > } > else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy) > { > @@ -1329,13 +1332,18 @@ get_range_strlen (tree arg, tree length[2], bitmap > *visited, int type, > return false; > } > else > - val = c_strlen (arg, 1); > + { > + tree arr; > + val = c_strlen (arg, 1, &arr); > + if (val && arr) > + *nulterm = false; > + } > > if (!val && fuzzy) > { > if (TREE_CODE (arg) == ADDR_EXPR) > return get_range_strlen (TREE_OPERAND (arg, 0), length, > - visited, type, fuzzy, flexp); > + visited, type, fuzzy, flexp, nulterm); > > if (TREE_CODE (arg) == ARRAY_REF) > { > @@ -1477,7 +1485,8 @@ get_range_strlen (tree arg, tree length[2], bitmap > *visited, int type, > || gimple_assign_unary_nop_p (def_stmt)) > { > tree rhs = gimple_assign_rhs1 (def_stmt); > - return get_range_strlen (rhs, length, visited, type, fuzzy, flexp); > + return get_range_strlen (rhs, length, visited, type, fuzzy, flexp, > + nulterm); > } > else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR) > { > @@ -1486,7 +1495,7 @@ get_range_strlen (tree arg, tree length[2], bitmap > *visited, int type, > > for (unsigned int i = 0; i < 2; i++) > if (!get_range_strlen (ops[i], length, visited, type, fuzzy, > - flexp)) > + flexp, nulterm)) > { > if (fuzzy == 2) > *maxlen = build_all_ones_cst (size_type_node); > @@ -1513,7 +1522,8 @@ get_range_strlen (tree arg, tree length[2], bitmap > *visited, int type, > if (arg == gimple_phi_result (def_stmt)) > continue; > > - if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp)) > + if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp, > + nulterm)) > { > if (fuzzy == 2) > *maxlen = build_all_ones_cst (size_type_node); > @@ -1545,19 +1555,27 @@ get_range_strlen (tree arg, tree length[2], bitmap > *visited, int type, > and false if PHIs and COND_EXPRs are to be handled optimistically, > if we can determine string length minimum and maximum; it will use > the minimum from the ones where it can be determined. > - STRICT false should be only used for warning code. */ > + STRICT false should be only used for warning code. > + When non-null, clear *NULTERM if ARG refers to a constant array > + that is known not be nul-terminated. Otherwise set it to true. */ > > bool > -get_range_strlen (tree arg, tree minmaxlen[2], bool strict) > +get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */, > + bool *nulterm /* = NULL */) > { > bitmap visited = NULL; > > minmaxlen[0] = NULL_TREE; > minmaxlen[1] = NULL_TREE; > > + bool nultermbuf; > + if (!nulterm) > + nulterm = &nultermbuf; > + *nulterm = true; > + > bool flexarray = false; > if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2, > - &flexarray)) > + &flexarray, nulterm)) > { > minmaxlen[0] = NULL_TREE; > minmaxlen[1] = NULL_TREE; > @@ -1576,7 +1594,7 @@ get_maxval_strlen (tree arg, int type) > tree len[2] = { NULL_TREE, NULL_TREE }; > > bool dummy; > - if (!get_range_strlen (arg, len, &visited, type, 0, &dummy)) > + if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, NULL)) > len[1] = NULL_TREE; > if (visited) > BITMAP_FREE (visited); > @@ -3496,12 +3514,14 @@ static bool > gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi) > { > gimple *stmt = gsi_stmt (*gsi); > + tree arg = gimple_call_arg (stmt, 0); > > wide_int minlen; > wide_int maxlen; > > tree lenrange[2]; > - if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true) > + bool nulterm; > + if (!get_range_strlen (arg, lenrange, true, &nulterm) > && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST > && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST) > { > @@ -3523,6 +3543,10 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi) > > if (minlen == maxlen) > { > + if (!nulterm) > + warn_string_no_nul (gimple_location (stmt), NULL_TREE, > + gimple_call_fndecl (stmt), arg); > + > lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL, > true, GSI_SAME_STMT); > replace_call_with_value (gsi, lenrange[0]); > diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h > index 04e9bfa..fe11728 100644 > --- a/gcc/gimple-fold.h > +++ b/gcc/gimple-fold.h > @@ -25,7 +25,7 @@ along with GCC; see the file COPYING3. If not see > extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL); > extern tree canonicalize_constructor_val (tree, tree); > extern tree get_symbol_constant_value (tree); > -extern bool get_range_strlen (tree, tree[2], bool = false); > +extern bool get_range_strlen (tree, tree[2], bool = false, bool * = NULL); > extern tree get_maxval_strlen (tree, int); > extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, > tree); > extern bool fold_stmt (gimple_stmt_iterator *); > diff --git a/gcc/testsuite/gcc.c-torture/execute/memchr-1.c > b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c > new file mode 100644 > index 0000000..2614bee > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c > @@ -0,0 +1,43 @@ > +/* PR tree-optimization/86711 - wrong folding of memchr > + > + Verify that memchr() of arrays initialized with string literals > + where the nul doesn't fit in the array doesn't find the nul. */ > + > +extern void* memchr (const void*, int, __SIZE_TYPE__); > + > +#define A(expr) \ > + ((expr) \ > + ? (void)0 \ > + : (__builtin_printf ("assertion failed on line %i: %s\n", \ > + __LINE__, #expr), \ > + __builtin_abort ())) > + > +static const char a1[4] = "1234"; > +static const char a2[2][4] = { "1234", "5678" }; > + > +static const char a3[2][4] = { "1234", "567" }; > + > +int main () > +{ > + volatile int i = 0; > + > + A (memchr (a1, 0, sizeof a1) == 0); > + A (memchr (a1 + 1, 0, sizeof a1 - 1) == 0); > + A (memchr (a1 + 3, 0, sizeof a1 - 3) == 0); > + A (memchr (a1 + i, 0, sizeof a1) == 0); > + > + A (memchr (a2, 0, sizeof a2) == 0); > + > + A (memchr (a2[0], 0, sizeof a2[0]) == 0); > + A (memchr (a2[1], 0, sizeof a2[1]) == 0); > + > + A (memchr (a2[0] + 1, 0, sizeof a2[0] - 1) == 0); > + A (memchr (a2[1] + 2, 0, sizeof a2[1] - 2) == 0); > + A (memchr (a2[1] + 3, 0, sizeof a2[1] - 3) == 0); > + > + A (memchr (a2[i], 0, sizeof a2[i]) == 0); > + A (memchr (a2[i] + 1, 0, sizeof a2[i] - 1) == 0); > + > + /* This one must find it. */ > + A (memchr (a3, 0, sizeof a3) == &a3[1][3]); > +} > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr86714.c > b/gcc/testsuite/gcc.c-torture/execute/pr86714.c > new file mode 100644 > index 0000000..3ad6852 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/pr86714.c > @@ -0,0 +1,26 @@ > +/* PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too > + long initializer > + > + The excessively long initializer for a[0] is undefined but this > + test verifies that the excess elements are not considered a part > + of the value of the array as a matter of QoI. */ > + > +const char a[2][3] = { "1234", "xyz" }; > +char b[6]; > + > +void *pb = b; > + > +int main () > +{ > + __builtin_memcpy (b, a, 4); > + __builtin_memset (b + 4, 'a', 2); > + > + if (b[0] != '1' || b[1] != '2' || b[2] != '3' > + || b[3] != 'x' || b[4] != 'a' || b[5] != 'a') > + __builtin_abort (); > + > + if (__builtin_memcmp (pb, "123xaa", 6)) > + __builtin_abort (); > + > + return 0; > +} > diff --git a/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c > b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c > new file mode 100644 > index 0000000..838528f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c > @@ -0,0 +1,239 @@ > +/* PR tree-optimization/86552 - missing warning for reading past the end > + of non-string arrays > + { dg-do compile } > + { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */ > + > +extern __SIZE_TYPE__ strlen (const char*); > + > +const char a[5] = "12345"; /* { dg-message "declared here" } */ > + > +int i0 = 0; > +int i1 = 1; > + > +void sink (int, ...); > + > +#define CONCAT(a, b) a ## b > +#define CAT(a, b) CONCAT(a, b) > + > +#define T(str) \ > + __attribute__ ((noipa)) \ > + void CAT (test_, __LINE__) (void) { \ > + sink (strlen (str)); \ > + } typedef void dummy_type > + > +T (a); /* { dg-warning "argument missing terminating nul" } > */ > +T (&a[0]); /* { dg-warning "nul" } */ > +T (&a[0] + 1); /* { dg-warning "nul" } */ > +T (&a[1]); /* { dg-warning "nul" } */ > +T (&a[i0]); /* { dg-warning "nul" } */ > +T (&a[i0] + 1); /* { dg-warning "nul" } */ > + > + > +const char b[][5] = { /* { dg-message "declared here" } */ > + "12", "123", "1234", "54321" > +}; > + > +T (b[0]); > +T (b[1]); > +T (b[2]); > +T (b[3]); /* { dg-warning "nul" } */ > +T (b[i0]); > + > +T (&b[2][1]); > +T (&b[2][1] + 1); > +T (&b[2][i0]); > +T (&b[2][1] + i0); > + > +T (&b[3][1]); /* { dg-warning "nul" } */ > +T (&b[3][1] + 1); /* { dg-warning "nul" } */ > +T (&b[3][i0]); /* { dg-warning "nul" } */ > +T (&b[3][1] + i0); /* { dg-warning "nul" } */ > +T (&b[3][i0] + i1); /* { dg-warning "nul" } */ > + > +T (i0 ? "" : b[0]); > +T (i0 ? "" : b[1]); > +T (i0 ? "" : b[2]); > +T (i0 ? "" : b[3]); /* { dg-warning "nul" } */ > +T (i0 ? b[0] : ""); > +T (i0 ? b[1] : ""); > +T (i0 ? b[2] : ""); > +T (i0 ? b[3] : ""); /* { dg-warning "nul" } */ > + > +T (i0 ? "1234" : b[3]); /* { dg-warning "nul" } */ > +T (i0 ? b[3] : "1234"); /* { dg-warning "nul" } */ > + > +T (i0 ? a : b[3]); /* { dg-warning "nul" } */ > +T (i0 ? b[0] : b[2]); > +T (i0 ? b[2] : b[3]); /* { dg-warning "nul" } */ > +T (i0 ? b[3] : b[2]); /* { dg-warning "nul" } */ > + > +T (i0 ? b[0] : &b[3][0] + 1); /* { dg-warning "nul" } */ > +T (i0 ? b[1] : &b[3][1] + i0); /* { dg-warning "nul" } */ > + > +/* It's possible to detect the missing nul in the following two > + expressions but GCC doesn't do it yet. */ > +T (i0 ? &b[3][1] + i0 : b[2]); /* { dg-warning "nul" "bug" { xfail *-*-* > } } */ > +T (i0 ? &b[3][i0] : &b[3][i1]); /* { dg-warning "nul" "bug" { xfail *-*-* > } } */ > + > + > +struct A { char a[5], b[5]; }; > + > +const struct A s = { "1234", "12345" }; > + > +T (s.a); > +T (&s.a[0]); > +T (&s.a[0] + 1); > +T (&s.a[0] + i0); > +T (&s.a[1]); > +T (&s.a[1] + 1); > +T (&s.a[1] + i0); > + > +T (s.b); /* { dg-warning "nul" } */ > +T (&s.b[0]); /* { dg-warning "nul" } */ > +T (&s.b[0] + 1); /* { dg-warning "nul" } */ > +T (&s.b[0] + i0); /* { dg-warning "nul" } */ > +T (&s.b[1]); /* { dg-warning "nul" } */ > +T (&s.b[1] + 1); /* { dg-warning "nul" } */ > +T (&s.b[1] + i0); /* { dg-warning "nul" } */ > + > +struct B { struct A a[2]; }; > + > +const struct B ba[] = { > + { { { "123", "12345" }, { "12345", "123" } } }, > + { { { "12345", "123" }, { "123", "12345" } } }, > + { { { "1", "12" }, { "123", "1234" } } }, > + { { { "123", "1234" }, { "12345", "12" } } } > +}; > + > +T (ba[0].a[0].a); > +T (&ba[0].a[0].a[0]); > +T (&ba[0].a[0].a[0] + 1); > +T (&ba[0].a[0].a[0] + i0); > +T (&ba[0].a[0].a[1]); > +T (&ba[0].a[0].a[1] + 1); > +T (&ba[0].a[0].a[1] + i0); > + > +T (ba[0].a[0].b); /* { dg-warning "nul" } */ > +T (&ba[0].a[0].b[0]); /* { dg-warning "nul" } */ > +T (&ba[0].a[0].b[0] + 1); /* { dg-warning "nul" } */ > +T (&ba[0].a[0].b[0] + i0); /* { dg-warning "nul" } */ > +T (&ba[0].a[0].b[1]); /* { dg-warning "nul" } */ > +T (&ba[0].a[0].b[1] + 1); /* { dg-warning "nul" } */ > +T (&ba[0].a[0].b[1] + i0); /* { dg-warning "nul" } */ > + > +T (ba[0].a[1].a); /* { dg-warning "nul" } */ > +T (&ba[0].a[1].a[0]); /* { dg-warning "nul" } */ > +T (&ba[0].a[1].a[0] + 1); /* { dg-warning "nul" } */ > +T (&ba[0].a[1].a[0] + i0); /* { dg-warning "nul" } */ > +T (&ba[0].a[1].a[1]); /* { dg-warning "nul" } */ > +T (&ba[0].a[1].a[1] + 1); /* { dg-warning "nul" } */ > +T (&ba[0].a[1].a[1] + i0); /* { dg-warning "nul" } */ > + > +T (ba[0].a[1].b); > +T (&ba[0].a[1].b[0]); > +T (&ba[0].a[1].b[0] + 1); > +T (&ba[0].a[1].b[0] + i0); > +T (&ba[0].a[1].b[1]); > +T (&ba[0].a[1].b[1] + 1); > +T (&ba[0].a[1].b[1] + i0); > + > + > +T (ba[1].a[0].a); /* { dg-warning "nul" } */ > +T (&ba[1].a[0].a[0]); /* { dg-warning "nul" } */ > +T (&ba[1].a[0].a[0] + 1); /* { dg-warning "nul" } */ > +T (&ba[1].a[0].a[0] + i0); /* { dg-warning "nul" } */ > +T (&ba[1].a[0].a[1]); /* { dg-warning "nul" } */ > +T (&ba[1].a[0].a[1] + 1); /* { dg-warning "nul" } */ > +T (&ba[1].a[0].a[1] + i0); /* { dg-warning "nul" } */ > + > +T (ba[1].a[0].b); > +T (&ba[1].a[0].b[0]); > +T (&ba[1].a[0].b[0] + 1); > +T (&ba[1].a[0].b[0] + i0); > +T (&ba[1].a[0].b[1]); > +T (&ba[1].a[0].b[1] + 1); > +T (&ba[1].a[0].b[1] + i0); > + > +T (ba[1].a[1].a); > +T (&ba[1].a[1].a[0]); > +T (&ba[1].a[1].a[0] + 1); > +T (&ba[1].a[1].a[0] + i0); > +T (&ba[1].a[1].a[1]); > +T (&ba[1].a[1].a[1] + 1); > +T (&ba[1].a[1].a[1] + i0); > + > +T (ba[1].a[1].b); /* { dg-warning "nul" } */ > +T (&ba[1].a[1].b[0]); /* { dg-warning "nul" } */ > +T (&ba[1].a[1].b[0] + 1); /* { dg-warning "nul" } */ > +T (&ba[1].a[1].b[0] + i0); /* { dg-warning "nul" } */ > +T (&ba[1].a[1].b[1]); /* { dg-warning "nul" } */ > +T (&ba[1].a[1].b[1] + 1); /* { dg-warning "nul" } */ > +T (&ba[1].a[1].b[1] + i0); /* { dg-warning "nul" } */ > + > + > +T (ba[2].a[0].a); > +T (&ba[2].a[0].a[0]); > +T (&ba[2].a[0].a[0] + 1); > +T (&ba[2].a[0].a[0] + i0); > +T (&ba[2].a[0].a[1]); > +T (&ba[2].a[0].a[1] + 1); > +T (&ba[2].a[0].a[1] + i0); > + > +T (ba[2].a[0].b); > +T (&ba[2].a[0].b[0]); > +T (&ba[2].a[0].b[0] + 1); > +T (&ba[2].a[0].b[0] + i0); > +T (&ba[2].a[0].b[1]); > +T (&ba[2].a[0].b[1] + 1); > +T (&ba[2].a[0].b[1] + i0); > + > +T (ba[2].a[1].a); > +T (&ba[2].a[1].a[0]); > +T (&ba[2].a[1].a[0] + 1); > +T (&ba[2].a[1].a[0] + i0); > +T (&ba[2].a[1].a[1]); > +T (&ba[2].a[1].a[1] + 1); > +T (&ba[2].a[1].a[1] + i0); > + > + > +T (ba[3].a[0].a); > +T (&ba[3].a[0].a[0]); > +T (&ba[3].a[0].a[0] + 1); > +T (&ba[3].a[0].a[0] + i0); > +T (&ba[3].a[0].a[1]); > +T (&ba[3].a[0].a[1] + 1); > +T (&ba[3].a[0].a[1] + i0); > + > +T (ba[3].a[0].b); > +T (&ba[3].a[0].b[0]); > +T (&ba[3].a[0].b[0] + 1); > +T (&ba[3].a[0].b[0] + i0); > +T (&ba[3].a[0].b[1]); > +T (&ba[3].a[0].b[1] + 1); > +T (&ba[3].a[0].b[1] + i0); > + > +T (ba[3].a[1].a); /* { dg-warning "nul" } */ > +T (&ba[3].a[1].a[0]); /* { dg-warning "nul" } */ > +T (&ba[3].a[1].a[0] + 1); /* { dg-warning "nul" } */ > +T (&ba[3].a[1].a[0] + i0); /* { dg-warning "nul" } */ > +T (&ba[3].a[1].a[1]); /* { dg-warning "nul" } */ > +T (&ba[3].a[1].a[1] + 1); /* { dg-warning "nul" } */ > +T (&ba[3].a[1].a[1] + i0); /* { dg-warning "nul" } */ > + > +T (ba[3].a[1].b); > +T (&ba[3].a[1].b[0]); > +T (&ba[3].a[1].b[0] + 1); > +T (&ba[3].a[1].b[0] + i0); > +T (&ba[3].a[1].b[1]); > +T (&ba[3].a[1].b[1] + 1); > +T (&ba[3].a[1].b[1] + i0); > + > + > +T (i0 ? ba[0].a[0].a : ba[0].a[0].b); /* { dg-warning "nul" } */ > +T (i0 ? ba[0].a[0].a : ba[0].a[0].b); /* { dg-warning "nul" } */ > + > +T (i0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]); /* { dg-warning "nul" } */ > +T (i0 ? &ba[3].a[1].a[1] : ba[0].a[0].a); /* { dg-warning "nul" } */ > + > +T (i0 ? ba[0].a[0].a : ba[0].a[1].b); > +T (i0 ? ba[0].a[1].b : ba[0].a[0].a); Bernd.