On 08/24/18 08:36, Jeff Law wrote: > On 08/02/2018 07:26 AM, Bernd Edlinger wrote: >> On 08/02/18 04:44, Martin Sebor wrote: >>> Since the foundation of the patch is detecting and avoiding >>> the overly aggressive folding of unterminated char arrays, >>> besides issuing a warning for such arguments to strlen, >>> the patch also fixes pr86711 - wrong folding of memchr, and >>> pr86714 - tree-ssa-forwprop.c confused by too long initializer. >>> >>> The substance of the attached updated patch is unchanged, >>> I have just added test cases for the two additional bugs. >>> >>> Bernd, as I mentioned Wednesday, the patch supersedes >>> yours here: >>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html >>> >> >> No problem, but I hope you understand, that I still uphold >> my patch. >> >> So we have two patches now: >> - mine, fixing a wrong code bug, >> - yours, implementing a new warning and fixing a wrong >> code bug at the same time. >> >> I will add a few comments to your patch below. > [ ... ] > > So a lot of the comments are out of date, presumably because Martin > fixed the issues you pointed out in his second version of the patch. > But there's still some useful nuggets in your comments that are still > relevant. >
Yes, possible, and therefore I would like updated patches summarize the changes. > FYI it appears that sometimes you comment above a chunk of code, and > other times below. That makes it exceedingly difficult to figure out > the issue you're trying to raise. > Okydoky. > >> >>> Martin >>> >>> On 07/30/2018 01:17 PM, Martin Sebor wrote: >>>> Attached is an updated version of the patch that handles more >>>> instances of calling strlen() on a constant array that is not >>>> a nul-terminated string. >>>> >>>> No other functions except strlen are explicitly handled yet, >>>> and neither are constant arrays with braced-initializer lists >>>> like const char a[] = { 'a', 'b', 'c' }; I am testing >>>> an independent solution for those (bug 86552). Once those >>>> are handled the warning will be able to detect those as well. >>>> >>>> Tested on x86_64-linux. >>>> >>>> On 07/25/2018 05:38 PM, Martin Sebor wrote: >>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html >>>>> >>>>> The fix for bug 86532 has been checked in so this enhancement >>>>> can now be applied on top of it (with only minor adjustments). >>>>> >>>>> On 07/19/2018 02:08 PM, Martin Sebor wrote: >>>>>> In the discussion of my patch for pr86532 Bernd noted that >>>>>> GCC silently accepts constant character arrays with no >>>>>> terminating nul as arguments to strlen (and other string >>>>>> functions). >>>>>> >>>>>> The attached patch is a first step in detecting these kinds >>>>>> of bugs in strlen calls by issuing -Wstringop-overflow. >>>>>> The next step is to modify all other handlers of built-in >>>>>> functions to detect the same problem (not part of this patch). >>>>>> Yet another step is to detect these problems in arguments >>>>>> initialized using the non-string form: >>>>>> >>>>>> const char a[] = { 'a', 'b', 'c' }; >>>>>> >>>>>> This patch is meant to apply on top of the one for bug 86532 >>>>>> (I tested it with an earlier version of that patch so there >>>>>> is code in the context that does not appear in the latest >>>>>> version of the other diff). >>>>>> >>>>>> Martin >>>>>> >>>>> >>>> >>> >>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long >>> initializer >>> PR tree-optimization/86711 - wrong folding of memchr >>> PR tree-optimization/86552 - missing warning for reading past the end of >>> non-string arrays >>> >>> gcc/ChangeLog: >>> >>> PR tree-optimization/86714 >>> PR tree-optimization/86711 >>> PR tree-optimization/86552 >>> * builtins.h (warn_string_no_nul): Declare.. >>> (c_strlen): Add argument. >>> * builtins.c (warn_string_no_nul): New function. >>> (fold_builtin_strlen): Add argument. Detect missing nul. >>> (fold_builtin_1): Adjust. >>> (string_length): Add argument and use it. >>> (c_strlen): Same. >>> (expand_builtin_strlen): Detect missing nul. >>> * expr.c (string_constant): Add arguments. Detect missing nul >>> terminator and outermost declaration it's missing in. >>> * expr.h (string_constant): Add argument. >>> * fold-const.c (c_getstr): Change argument to bool*, rename >>> other arguments. >>> * fold-const-call.c (fold_const_call): Detect missing nul. >>> * gimple-fold.c (get_range_strlen): Add argument. >>> (get_maxval_strlen): Adjust. >>> * gimple-fold.h (get_range_strlen): Add argument. >>> >>> gcc/testsuite/ChangeLog: >>> >>> PR tree-optimization/86714 >>> PR tree-optimization/86711 >>> PR tree-optimization/86552 >>> * gcc.c-torture/execute/memchr-1.c: New test. >>> * gcc.c-torture/execute/pr86714.c: New test. >>> * gcc.dg/warn-strlen-no-nul.c: New test. >>> >>> diff --git a/gcc/builtins.c b/gcc/builtins.c >>> index aa3e0d8..f4924d5 100644 >>> --- a/gcc/builtins.c >>> +++ b/gcc/builtins.c >>> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, >>> int); >>> static rtx expand_builtin_expect (tree, rtx); >>> static tree fold_builtin_constant_p (tree); >>> static tree fold_builtin_classify_type (tree); >>> -static tree fold_builtin_strlen (location_t, tree, tree); >>> +static tree fold_builtin_strlen (location_t, tree, tree, tree); >>> static tree fold_builtin_inf (location_t, tree, int); >>> static tree rewrite_call_expr (location_t, tree, int, tree, int, ...); >>> static bool validate_arg (const_tree, enum tree_code code); >>> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, >>> unsigned maxelts) >>> return n; >>> } >>> >>> +/* For a call expression EXP to a function that expects a string argument, >>> + issue a diagnostic due to it being a called with an argument NONSTR >>> + that is a character array with no terminating NUL. */ >>> + >>> +void >>> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr) >>> +{ >>> + loc = expansion_point_location_if_in_system_header (loc); >>> + >>> + bool warned; >>> + if (exp) >>> + { >>> + if (!fndecl) >>> + fndecl = get_callee_fndecl (exp); >>> + warned = warning_at (loc, OPT_Wstringop_overflow_, >>> + "%K%qD argument missing terminating nul", >>> + exp, fndecl); >>> + } >>> + else >>> + { >>> + gcc_assert (fndecl); >>> + warned = warning_at (loc, OPT_Wstringop_overflow_, >>> + "%qD argument missing terminating nul", >>> + fndecl); >>> + } >>> + >>> + if (warned && DECL_P (nonstr)) >>> + inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared >>> here"); >>> +} >>> + >>> /* Compute the length of a null-terminated character string or wide >>> character string handling character sizes of 1, 2, and 4 bytes. >>> TREE_STRING_LENGTH is not the right way because it evaluates to >>> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, >>> unsigned maxelts) >>> accesses. Note that this implies the result is not going to be emitted >>> into the instruction stream. >>> >>> + When ARR is non-null and the string is not properly nul-terminated, >>> + set *ARR to the declaration of the outermost constant object whose >>> + initializer (or one of its elements) is not nul-terminated. >>> + >>> The value returned is of type `ssizetype'. >>> >>> Unfortunately, string_constant can't access the values of const char >>> arrays with initializers, so neither can we do so here. */ >>> >>> tree >>> -c_strlen (tree src, int only_value) >>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */) >>> { >>> STRIP_NOPS (src); >>> + >>> + /* Used to detect non-nul-terminated strings in subexpressions >>> + of a conditional expression. When ARR is null, point it at >>> + one of the elements for simplicity. */ >>> + tree arrs[] = { NULL_TREE, NULL_TREE }; >>> + if (!arr) >>> + arr = arrs; >> >> now arr is always != NULL > Right. It's renamed in the follow-up patch from Martin, but yes, it's > always non-null. Note in this case your comment is after the code > you're referring to. > >> >>> + >>> if (TREE_CODE (src) == COND_EXPR >>> && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0)))) >>> { >>> - tree len1, len2; >>> - >>> - len1 = c_strlen (TREE_OPERAND (src, 1), only_value); >>> - len2 = c_strlen (TREE_OPERAND (src, 2), only_value); >>> + tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs); >>> + tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1); >>> if (tree_int_cst_equal (len1, len2)) >>> - return len1; >>> + { >> >> funny, if called with NULL *arr and arrs[0] alias each other. > >> >>> + *arr = arrs[0] ? arrs[0] : arrs[1]; >>> + return len1; >>> + } >>> } > > And in this case it looks like your comment is before the code you're > commenting about. It was fairly obvious in this case because the code > prior to your "funny, if called..." comment didn't reference arr or arrs > at all. > > And more importantly, why are you concerned about the aliasing? > It is just *arr = arrs[0] does nothing, but it looks like the author was not aware of it. It may be okay, but causes head-scratching. If you don't have the context you will think this does something different. > > > > >>> >>> if (TREE_CODE (src) == COMPOUND_EXPR >>> && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0)))) >>> - return c_strlen (TREE_OPERAND (src, 1), only_value); >>> + return c_strlen (TREE_OPERAND (src, 1), only_value, arr); >>> >>> location_t loc = EXPR_LOC_OR_LOC (src, input_location); >>> >>> /* Offset from the beginning of the string in bytes. */ >>> tree byteoff; >>> - src = string_constant (src, &byteoff); >>> - if (src == 0) >>> - return NULL_TREE; >>> + /* Set if array is nul-terminated, false otherwise. */ >>> + bool nulterm; >> >> note arr is always != null or pointing to arrs[0]. >> >>> + src = string_constant (src, &byteoff, &nulterm, arr); >>> + if (!src) >>> + { >>> + *arr = arrs[0] ? arrs[0] : arrs[1]; >>> + return NULL_TREE; >>> + } >>> + >>> + /* Clear *ARR when the string is nul-terminated. It should be >>> + of no interest to callers. */ >>> + if (nulterm) >>> + *arr = NULL_TREE; >>> >>> /* Determine the size of the string element. */ >>> unsigned eltsize >>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value) >>> maxelts = maxelts / eltsize - 1; >>> } >>> >>> + /* Unless the caller is prepared to handle it by passing in a non-null >>> + ARR, fail if the terminating nul doesn't fit in the array the string >>> + is stored in (as in const char a[3] = "123"; */ >> >> note arr is always != NULL, thus this if is never taken. >> >>> + if (!arr && maxelts < strelts) >>> + return NULL_TREE; >>> + > Right. And I think that check is gone in the second version of Martin's > patch. > > >>> /* PTR can point to the byte representation of any string type, >>> including >>> char* and wchar_t*. */ >>> const char *ptr = TREE_STRING_POINTER (src); >>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value) >>> offsave = fold_convert (ssizetype, offsave); >>> tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, >>> offsave, >>> build_int_cst (ssizetype, len * eltsize)); >> >> this computation is wrong, it computes not in units of eltsize, >> I am however not sure if it is really good that this function tries to >> compute strlen of wide character strings. > Note that in the second version of Martin's patch eltsize will always be > 1 when we get here. Furthermore, the multiplication by eltsize is gone. > It looks like you switched back to commenting after the code for that > comment, but then immediately thereafter... > Acutally I had no idea that the second patch did resolve some of my comments and which those were. I had the impression that it is just splitting up of a large patch into several smaller without reworking at the same time. Once again, a summary what was changed would be helpful. > >> >> That said, please fix this computation first, in a different patch >> instead of just fixing the indentation. (I know I pointed that lines are too >> long here, but that was before I realized that the whole length computation >> here is wrong). >> >>> - tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), >>> offsave); >>> + tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), >>> + offsave); >>> return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp, >>> build_zero_cst (ssizetype)); >>> } >>> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value) >>> Since ELTOFF is our starting index into the string, no further >>> calculation is needed. */ > The comment shown above appears to refer to the code below the comment. > Again, this makes it exceedingly confusing to understand your comments > and take appropriate action. > > > > >> >> What are you fixing here, I think that was another bug. >> If this fixes something then it should be in a different patch, >> just handling this. >> >>> unsigned len = string_length (ptr + eltoff * eltsize, eltsize, >>> - maxelts - eltoff); >>> + strelts - eltoff); >>> >>> return ssize_int (len); >>> } > I'm guessing the comment in this case refers to the code after. > Presumably questioning the change from maxelts to strelts. > > Yes. I was thinking that could be a patch of its own. > > >>> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target, >>> >>> struct expand_operand ops[4]; >>> rtx pat; >>> - tree len; >>> tree src = CALL_EXPR_ARG (exp, 0); >>> rtx src_reg; >>> rtx_insn *before_strlen; >>> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target, >>> unsigned int align; >>> >>> /* If the length can be computed at compile-time, return it. */ >>> - len = c_strlen (src, 0); >>> + tree array; >>> + tree len = c_strlen (src, 0, &array); >> >> You know the c_strlen tries to compute wide character sizes, >> but strlen does not do that, strlen (L"abc") should give 1 >> (or 0 on a BE machine) >> I wonder if that is correct. > So I think this is fixed by your change which restored the default > behavior of c_strlen to count bytes. Which restores the behavior of > c_strlen to match that of the strlen library call. > > So for something like L"abc", the right value is 1 or 0 for LE and BE > targets respectively. > Well, yes, although I changed my mind on strlen(L"abc") meanwhile. This may appear as if I contradict myself but, it is more like I learn. The installed patch for the ELTSIZE parameter was in part inspired by the thought about "not folding invalid stuff" that was forming in my mind at that time, even before I wrote it down and sent it to this list. So the patch does reject to give a value for strlen(L"abc") since it is an invalid call. Previously that was counting bytes in a wide char string, but I doubt that was the original intention though. > > >> >>> if (len) >>> - return expand_expr (len, target, target_mode, EXPAND_NORMAL); >>> + { >>> + if (array) >>> + { >>> + /* Array refers to the non-nul terminated constant array >>> + whose length is attempted to be computed. */ >> >> I really wonder if it would not make more sense to have a >> nonterminated_string_constant_p instead. > Rather than a boolean I think we want a tree * so that we can bubble up > more information to the caller WRT non-terminated strings. > >> >> Last time I wanted to implement a warning in expand I faced the >> problem that inlined functions will get one warning per invocation? > Yea, but that's just the way things are. I don't think it's really an > issue for the patches we're looking at right now. > No. >>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg) >>> return build_int_cst (integer_type_node, type_to_class (TREE_TYPE >>> (arg))); >>> } >>> >>> -/* Fold a call to __builtin_strlen with argument ARG. */ >>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG. */ >>> >>> static tree >>> -fold_builtin_strlen (location_t loc, tree type, tree arg) >>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg) >>> { >>> if (!validate_arg (arg, POINTER_TYPE)) >>> return NULL_TREE; >>> else >>> { >>> - tree len = c_strlen (arg, 0); >>> - >>> + tree arr = NULL_TREE; >>> + tree len = c_strlen (arg, 0, &arr); >> >> Is it possible to write a test case where strlen(L"test") reaches this point? >> what will c_strlen return then? > Given your fix to have c_strlen count bytes by default, I think we're OK > here. > Yes. But my fix was still incomplete. So incremental progress is necessary here. >> >>> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset) >>> return NULL_TREE; >>> >>> tree offset; >>> - if (tree str = string_constant (arg0, &offset)) >>> + if (tree str = string_constant (arg0, &offset, nulterm, decl)) >>> { >>> /* Avoid pointers to arrays (see bug 86622). */ >>> if (POINTER_TYPE_P (TREE_TYPE (arg)) >>> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset) >>> if (TREE_CODE (array) == STRING_CST) >> >> Well, actually I think there _are_ STING_CSTs which are not null terminated. >> Maybe not in C. But Fortran, Ada, Go... >> >>> { >>> *ptr_offset = fold_convert (sizetype, offset); >>> + if (nulterm) >>> + *nulterm = true; >>> + if (decl) >>> + *decl = NULL_TREE; >>> return array; >>> } > So your comment seems to be referring to the code above the comment as > well as below. Confusing. Consistency really helps. > > Are we at a point where we're ready to declare STRING_CSTs as always > being properly terminated? If not the hunk of code above needs some > rethinking since we can't guarantee the string is properly terminated. > No. That is still in the flux. And I am not sure if the "properly terminated" will survive. I am digging deep to the ground now. And the correctness of the code in questing depends heavily on the results. Therefore I would like to fix this from the ground. Adding lots of new code that is based on wrong assumptions will not help. > >>> >>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset) >>> if (!array_size || TREE_CODE (array_size) != INTEGER_CST) >>> return NULL_TREE; >>> >>> + unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size); >>> + >> >> I don't understand why this is necessary at all. >> It looks way too complicated, to say the least. >> >> TREE_TYPE (init) has already the type of the member. >> >>> + /* When ARG refers to an aggregate (of arrays) try to determine >>> + the size of the character array within the aggregate. */ >>> + tree ref = arg; >>> + tree reftype = TREE_TYPE (arg); >>> + >>> + if (TREE_CODE (ref) == MEM_REF) >>> + { >>> + ref = TREE_OPERAND (ref, 0); >>> + if (TREE_CODE (ref) == ADDR_EXPR) >>> + { >>> + ref = TREE_OPERAND (ref, 0); >>> + reftype = TREE_TYPE (ref); >>> + } >>> + } >>> + else >>> + while (TREE_CODE (ref) == ARRAY_REF) >>> + { >>> + reftype = TREE_TYPE (ref); >>> + ref = TREE_OPERAND (ref, 0); >>> + } >>> + >>> + if (TREE_CODE (ref) == COMPONENT_REF) >>> + reftype = TREE_TYPE (ref); >>> + >>> + while (TREE_CODE (reftype) == ARRAY_TYPE) >>> + { >>> + tree next = TREE_TYPE (reftype); >>> + if (TREE_CODE (next) == INTEGER_TYPE) >>> + { >>> + if (tree size = TYPE_SIZE_UNIT (reftype)) >>> + if (tree_fits_uhwi_p (size)) >>> + array_elts = tree_to_uhwi (size); >> >> so array_elts is measued in bytes. > So I'm guessing your comment about the code looking way too complicated > is referring to the code *after* your comment. That code is not in the > v2 patch. At least not 01/06 which addresses the codegen/opt issues. I > haven't checked the full kit to see if this code appears in a subsequent > patch. > No I meant the code between /* When ARG refers to an aggregate (of arrays) try to determine the size of the character array within the aggregate. */ and here. It is wrong (assuming C semantic on GIMPLE). > >> >>> + break; >>> + } >>> + >>> + reftype = TREE_TYPE (reftype); >>> + } >>> + >>> + if (decl) >>> + *decl = array; >>> + >>> /* Avoid returning a string that doesn't fit in the array >>> it is stored in, like >>> const char a[4] = "abcde"; >>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset) >>> unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init); >>> length = string_length (TREE_STRING_POINTER (init), charsize, >>> length / charsize); >> >> Some callers especially those where the wrong code happens, expect >> to be able to access the STRING_CST up to TREE_STRING_LENGTH, >> But using string_length assume thy stop at the first nul char. > Example please? > tree-ssa-forwprop.c: str1 = string_constant (src1, &off1); if (str1 == NULL_TREE) break; if (!tree_fits_uhwi_p (off1) || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0 || compare_tree_int (len1, TREE_STRING_LENGTH (str1) - tree_to_uhwi (off1)) > 0 Bernd.