On 11/08/2016 05:09 PM, Martin Sebor wrote:
The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise. This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length. As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.
The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.
Martin
PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning. That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function. I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures. Does anyone have any suggestions
how to make this work? (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)
gcc-78245.diff
PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically
allocated buffer
gcc/testsuite/ChangeLog:
PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.
gcc/ChangeLog:
PR middle-end/78245
* gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(compute_object_size, internal_object_size): New functions.
(compute_builtin_object_size): Call internal_object_size.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (compute_object_size): Declare.
Sorry. Just not getting to many of the pre-stage1 close patches as fast
as I'd like.
My only real concern here is that if we call compute_builtin_object_size
without having initialized the passes, then we initialize, compute, then
finalize. Subsequent calls will go through the same process -- the key
being each one re-computes the internal state which might get expensive.
Wouldn't it just make more sense to pull up the init/fini calls, either
explicitly (which likely means externalizing the init/fini routines) or
by wrapping all this stuff in a class and instantiating a suitable object?
I think the answer to your memory management question is that internal
state is likely not marked as a GC root and thus if you get a GC call
pieces of internal state are not seen as reachable, but you still can
reference them. ie, you end up with dangling pointers.
Usually all you'd have to do is mark them so that gengtype will see
them. Bitmaps, trees, rtl, are all good examples. So marking the
bitmap would look like:
static GTY (()) bitmap computed[4];
Or something like that.
You might try --enable-checking=yes,extra,gc,gcac
That will be slow, but is often helpful for tracking down cases where
someone has an object expected to be live across passes, but it isn't
reachable because someone failed to register a GC root.
Jeff