On 07/08/2017 02:45 PM, Martin Sebor wrote: > PR 81117 asks for improved detection of common misuses(*) of > strncpy and strncat. The attached patch is my solution. It > consists of three related sets of changes: > > 1) Adds a new option, -Wstringop-truncation, that diagnoses calls > to strncpy, and stpncpy (and also strncat) that truncate the copy. > This helps highlight the common but incorrect assumption that > the first two functions NUL-terminate the copy (see, for example, > CWE-170) For strncat, it helps detect cases of inadvertent > truncation of the source string by passing in a bound that's > less than or equal to its length. > > 2) Enhances -Wstringon-overflow to diagnose calls of the form > strncpy(D, S, N) where the bound N depends on a call to strlen(S). > This misuse is common in legacy code when, often in response to > the adoption of a secure coding initiative, while replacing uses > of strcpy with strncpy, the engineer either makes a mistake, or > doesn't have a good enough understanding of how the function works, > or does only the bare minimum to satisfy the requirement to avoid > using strcpy without actually improving anything. > > 3) Enhances -Wsizeof-pointer-memaccess to also warn about uses of > the functions to copy an array to a destination of an unknown size > that specify the size of the array as the bound. Given the > pervasive [mis]use of strncpy to bound the copy to the size of > the destination, instances like this suggest a bug: a possible > buffer overflow due to an excessive bound (see, for example, > CWE-806). In cases when the call is safe, it's equivalent to > the corresponding call to memcpy which is clearer and can be > more efficient. > > Martin > > PS By coincidence rather than by intent, the strncat warnings > are a superset of Clang's -Wstrncat-size. AFAICS, Clang only > warns when the destination is an array of known size and > doesn't have a corresponding warning for strncpy. > > [*] Here's some background into these misuses. > > The purpose of the historical strncpy function introduced in V7 > UNIX was to completely fill an array of chars with data, either > by copying an initial portion of a source string, or by clearing > it. I.e., its purpose wasn't to create NUL-terminated strings. > An example of its use was to fill the directory entry d_name > array (dirent::d_name) with the name of a file. > > The original purpose of the strncat function, on the other hand, > was to append a not necessarily NUL-terminated array of chars > to a string to form a NUL-terminated concatenation of the two. > An example use case is appending a directory entry (struct > dirent::d_name) that need not be NUL-terminated, to form > a pathname which does. > > Largely due to a misunderstanding of the functions' purpose they > have become commonly used (and misused) to make "safe," bounded > string copies by safeguarding against accidentally overflowing > the destination. This has led to great many bugs and security > vulnerabilities. > > > gcc-81117.diff > > > PR c/81117 - Improve buffer overflow checking in strncpy > > gcc/ChangeLog: > > PR c/81117 > * builtins.c (compute_objsize): Handle arrays that > compute_builtin_object_size likes to fail for. > (check_strncpy_sizes): New function. > (expand_builtin_strncpy): Call check_strncpy_sizes. > * doc/invoke.texi (-Wsizeof-pointer-memaccess): Document enhancement. > (-Wstringop-truncation): Document new option. > * gimple-fold.c (gimple_fold_builtin_strncpy): Implement > -Wstringop-truncation. > (gimple_fold_builtin_strncat): Same. > * gimple.c (gimple_build_call_from_tree): Set call location. > * tree-ssa-strlen.c (strlen_to_stridx): New global variable. > (is_strlen_related_p): New function. > (check_builtin_strxncpy, check_builtin_strncat): Same. > handle_builtin_strlen): Use strlen_to_stridx. > (strlen_optimize_stmt): Handle flavors of strncat, strncpy, and > stpncpy. > Use strlen_to_stridx. > (pass_strlen::execute): Release strlen_to_stridx. > > gcc/ada/ChangeLog: > > PR c/81117 > * adadecode.c (__gnat_decode): Replace pointless strncpy with > memcpy. > * argv.c (__gnat_fill_arg): Same. > > gcc/c-family/ChangeLog: > > PR c/81117 > * c-common.c (resort_sorted_fields): Replace pointless strncpy > with memcpy. > * c-warn.c (sizeof_pointer_memaccess_warning): Handle arrays. > * c.opt (-Wstriingop-truncation): New option. > > gcc/fortran/ChangeLog: > > PR c/81117 > * decl.c (build_sym): Replace pointless strncpy with strcpy. > > gcc/objc/ChangeLog: > > PR c/81117 > * objc-encoding.c (encode_type): Replace pointless strncpy with > memcpy. > > gcc/testsuite/ChangeLog: > > PR c/81117 > * c-c++-common/Wsizeof-pointer-memaccess3.c: New test. > * c-c++-common/Wstringop-overflow.c: New test. > * c-c++-common/Wstringop-truncation.c: New test. > * g++.dg/torture/Wsizeof-pointer-memaccess1.C: Adjust. > * g++.dg/torture/Wsizeof-pointer-memaccess2.C: Same. > * gcc.dg/Walloca-1.c: Disable macro tracking. So I think the fixes exposed by the new warning are OK to go in as-is immediately if you wish to do so. Minor questions on the actual improved warnings inline.
> > diff --git a/gcc/builtins.c b/gcc/builtins.c > index 608993a..706f051 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -3300,13 +3300,43 @@ check_sizes (int opt, tree exp, tree size, tree > maxlen, tree src, tree objsize) > the size of the object if successful or NULL when the size cannot > be determined. */ > > -static inline tree > +static tree > compute_objsize (tree dest, int ostype) > { > unsigned HOST_WIDE_INT size; > - if (compute_builtin_object_size (dest, ostype & 3, &size)) > + > + /* Only the two least significant bits are meaningful. */ > + ostype &= 3; > + > + if (compute_builtin_object_size (dest, ostype, &size)) > return build_int_cst (sizetype, size); > > + /* Unless computing the largest size (for memcpy and other raw memory > + functions), try to determine the size of the object from its type. */ > + if (!ostype) > + return NULL_TREE; > + > + if (TREE_CODE (dest) == SSA_NAME) > + { > + gimple *stmt = SSA_NAME_DEF_STMT (dest); > + if (!is_gimple_assign (stmt)) > + return NULL_TREE; > + > + dest = gimple_assign_rhs1 (stmt); > + } > + > + if (TREE_CODE (dest) != ADDR_EXPR) > + return NULL_TREE; > + > + tree type = TREE_TYPE (dest); > + if (TREE_CODE (type) == POINTER_TYPE) > + type = TREE_TYPE (type); > + > + type = TYPE_MAIN_VARIANT (type); > + > + if (TREE_CODE (type) == ARRAY_TYPE) > + return TYPE_SIZE_UNIT (type); So I *think* TYPE_SIZE_UNIT isn't necessarily guaranteed to be a INTEGER_CST, it could be a non-constant expression for the size. Are the callers of compute_objsize prepared to handle that? Just to be clear, I'd prefer to return TYPE_SIZE_UNIT even if it's not an INTEGER_CST, I'm just worried about the callers ability to handle that correctly. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index d0b9050..e4208d9 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -5170,6 +5170,57 @@ whether to issue a warning. Similarly to > @option{-Wstringop-overflow=3} this > setting of the option may result in warnings for benign code. > @end table > > +@item -Wstringop-truncation > +@opindex Wstringop-overflow > +@opindex Wno-stringop-overflow > +Warn for calls to bounded string manipulation functions such as > @code{strncat}, > +@code{strncpy}, and @code{stpncpy} that may either truncate the copied string > +or leave the destination unchanged. > + > +In the following example, the call to @code{strncat} specifies the length > +of the source string as the bound. If the destination contains a non-empty > +string the copy of the source will be truncated. Therefore, the call is > +diagnosed. To avoid the warning use @code{strlen (d) - 4)} as the bound; > + > +@smallexample > +void append (char *d) > +@{ > + strncat (d, ".txt", 4); > +@} > +@end smallexample Sorry. I don't follow this particular example. Where's the truncation when strlen (SRC) == N for strncat? In that case aren't we going to append SRC without the nul terminator, then nul terminate the result? Isn't that the same as appending SRC with its nul terminator? Am I missing something here? Also your advice for avoiding the warning seems wrong. Isn't it going to produce a huge value if strlen (d) < 4? > +@smallexample > +void make_file (const char *name) > +@{ > + char path[PATH_MAX]; > + strncpy (path, name, sizeof path - 1); > + strncat (path, ".text", sizeof ".text"); > + @dots{} > +@} > +@end smallexample > + > +Th @option{-Wsizeof-pointer-memaccess} option is enabled by @option{-Wall}. Nit: s/Th/The/ > > @item -Wsizeof-array-argument > @opindex Wsizeof-array-argument > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > index d94dc9c..5d021f9 100644 > --- a/gcc/gimple-fold.c > +++ b/gcc/gimple-fold.c > @@ -1856,21 +1893,68 @@ gimple_fold_builtin_strncat (gimple_stmt_iterator > *gsi) > return true; > } > > + if (!nowarn && cmpsrc <= 0) > + { > + /* To avoid certain truncation the specified bound should also > + not be equal to (or less than) the length of the source. */ > + location_t loc = gimple_location (stmt); > + if (warning_at (loc, OPT_Wstringop_truncation, > + cmpsrc == 0 > + ? G_("%qD specified bound %E equals source length") > + : G_("%qD specified bound %E is less than source " > + "length %u"), > + gimple_call_fndecl (stmt), len, srclen)) > + gimple_set_no_warning (stmt, true); > + } So I think this corresponds to the example above I asked about. Where's the truncation when the bound is equal to the source length? > diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c > index b0563fe..1f4b0f3 100644 > --- a/gcc/tree-ssa-strlen.c > +++ b/gcc/tree-ssa-strlen.c > @@ -40,11 +40,14 @@ along with GCC; see the file COPYING3. If not see > #include "expr.h" > #include "tree-dfa.h" > #include "domwalk.h" > +#include "tree-ssa-alias.h" > #include "tree-ssa-propagate.h" > #include "params.h" > #include "ipa-chkp.h" > #include "tree-hash-traits.h" > #include "builtins.h" > +#include "diagnostic-core.h" > +#include "diagnostic.h" > > /* A vector indexed by SSA_NAME_VERSION. 0 means unknown, positive value > is an index into strinfo vector, negative value stands for > @@ -146,6 +149,9 @@ struct decl_stridxlist_map > mappings. */ > static hash_map<tree_decl_hash, stridxlist> *decl_to_stridxlist_htab; > > +typedef std::pair<int, location_t> stridx_strlenloc; > +static hash_map<tree, stridx_strlenloc> strlen_to_stridx; > + > /* Obstack for struct stridxlist and struct decl_stridxlist_map. */ > static struct obstack stridx_obstack; > > @@ -1606,6 +1618,133 @@ handle_builtin_strcpy (enum built_in_function bcode, > gimple_stmt_iterator *gsi) > fprintf (dump_file, "not possible.\n"); > } > > +/* Return true if LEN depends on a call to strlen(SRC) in an interesting > + way. LEN can either be an integer expression, or a pointer (to char). > + When it is the latter (such as in recursive calls to self) is is > + assumed to be the argument in some call to strlen() whose relationship > + to SRC is being ascertained. */ > + > +static bool > +is_strlen_related_p (tree src, tree len) > +{ > + if (TREE_CODE (TREE_TYPE (len)) == POINTER_TYPE > + && operand_equal_p (src, len, 0)) > + return true; > + > + if (TREE_CODE (len) != SSA_NAME) > + return false; > + > + gimple *def_stmt = SSA_NAME_DEF_STMT (len); > + if (!def_stmt) > + return false; > + > + if (is_gimple_call (def_stmt)) > + { > + tree func = gimple_call_fndecl (def_stmt); > + if (!valid_builtin_call (def_stmt) > + || DECL_FUNCTION_CODE (func) != BUILT_IN_STRLEN) > + return false; > + > + tree arg = gimple_call_arg (def_stmt, 0); > + return is_strlen_related_p (src, arg); > + } > + > + if (!is_gimple_assign (def_stmt)) > + return false; > + > + tree_code code = gimple_assign_rhs_code (def_stmt); > + tree rhs1 = gimple_assign_rhs1 (def_stmt); > + tree rhstype = TREE_TYPE (rhs1); > + > + if ((TREE_CODE (rhstype) == POINTER_TYPE && code == POINTER_PLUS_EXPR) > + || (INTEGRAL_TYPE_P (rhstype) > + && (code == BIT_AND_EXPR > + || code == NOP_EXPR))) > + { > + /* Pointer plus (an integer) and integer cast or truncation are > + considered among the (potentiall) related expressions to strlen. NIT: s/potentiall/potentially/ > @@ -2512,6 +2651,19 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) > case BUILT_IN_STPCPY_CHK_CHKP: > handle_builtin_strcpy (DECL_FUNCTION_CODE (callee), gsi); > break; > + > + case BUILT_IN_STRNCAT: > + case BUILT_IN_STRNCAT_CHK: > + check_builtin_strncat (DECL_FUNCTION_CODE (callee), gsi); > + break; > + > + case BUILT_IN_STPNCPY: > + case BUILT_IN_STPNCPY_CHK: > + case BUILT_IN_STRNCPY: > + case BUILT_IN_STRNCPY_CHK: > + check_builtin_stxncpy (DECL_FUNCTION_CODE (callee), gsi); > + break; > + So we've got calls to check the arguments, but not optimize here. But the containing function is "strlen_optimize_stmt". Would it make sense to first call strlen_optimize_stmt to handle the optimization cases, then to call a new separate function to handle warnings for the strn* family?