Re: Using source-level annotations to help GCC detect buffer overflows

Martin Sebor via Gcc Wed, 30 Jun 2021 10:12:45 -0700

On 6/29/21 12:31 PM, David Brown wrote:

On 29/06/2021 17:50, Martin Sebor wrote:

On 6/29/21 6:27 AM, David Brown wrote:

On 28/06/2021 21:06, Martin Sebor via Gcc wrote:

I wrote an article for the Red Hat Developer blog about how
to annotate code to get the most out of GCC's access checking
warnings like -Warray-bounds, -Wformat-overflow, and
-Wstringop-overflow.  The article published last week:


https://developers.redhat.com/articles/2021/06/25/use-source-level-annotations-help-gcc-detect-buffer-overflows


Thanks for that write-up - and of course thank you to whoever
implemented these attributes!

The caveat that the access attributes are lost when a function is
inlined is an important one.  As a user who appreciates all the checks I
can get, it is disappointing - but I assume there are good reasons for
that limitation.  I can merely hope that will change in future gcc
versions.


There's nothing the attribute could obviously attach to after a call
has been inlined.  An extreme example is a function whose argument
isn't used:

   __attribute__ ((access (write_only, 1, 2))) void
   f (char *p, int n) { }

(The function might have a body in the original source that could
be eliminated from the IL based on the values of other arguments.)


Could these attributes not be attached to the arguments when the
function is called, or the parameters when the function is expanded?
After all, in cases such as the "access" attribute it is not the
function as such that has the access hints, it is the parameters of the
function.

(I'm talking here based on absolutely no knowledge of how this is
implemented, but it's always possible that a different view, unbiased by
knowing the facts, can inspire new ideas.)


Attaching these attributes to function parameters is an interesting
idea that might be worth exploring.  We've talked about letting
attribute access apply to variables for other reasons (detecting
attempts to modify immutable objects, as I mention in the article).
so your suggestion would be in line with that.  Associating two
variables that aren't parameters might be tricky.


Calls to it that are not inlined will be checked but those that are
won't be.  This could be improved by doing the checking also before
inlining but at a cost of some false positives for code that's later
determined to be unreachable.  I don't have a sense of how bad it
might be so it's something to try.  This class of false positives
could also be dealt with by queuing up the warnings (e.g., by
emitting them into the IL via __builtin_warning) and issuing them
only if they survive dead code elimination.  This is something I'd
like to try to tackle for GCC 12.


I fully appreciate that some checks can be easier earlier in the
process, others later.  It might even be helpful to do similar checks at
more than one stage, and combine the results.


I believe it would make sense to add this information to the gcc manual
page for common function attributes.  There are quite a number of
attributes that are useful for static checking, such as
"warn_unused_result" and "nonnull".  Are these also dropped if the
function is inlined?


I agree the documentation could stand to be made clearer on this
point.  In general, I think it would be helpful to give users
more guidance about what to expect from attributes as well as
warnings: which ones are purely lexical and which ones flow-
sensitive and so inherently susceptible to false positives and
negatives, and to what extent.


It could be difficult to quantify that kind of thing, but sometimes
guidance could be useful.  (There is already such information for some
warning flags, especially those that support multiple levels.)

Certainly since first reading about the "access" attributes, I have been
considering adding them to my current project.  I have also been mulling
around in my head possibilities of making variadic templates in C++ that
add access attributes in the right places for some kinds of pointers -
but now that I know the attributes will get dropped for inline
functions, and such templates would involve inline functions, there is
little point.  (Maybe I will still figure a neat way to do this for
external functions - it just won't be useful in as many places.)


Unfortunately, with extensive inlining and templates, C++ support
for these attributes is less robust than it ideally would be.
Improving it is on my to do list.

Whether an attribute has an effect depends on the compilation stage
where it's handled.  warn_unused_result is handled very early (well
before inlining) so it always has the expected effect.  Attribute
nonnull is handled both early (to catch the simple cases) and also
later, after inlining, to benefit from some flow analysis, so its
effect is lost if the function it attaches to is inlined.  Attribute
access is handled very late and so it suffers from this problem
even more.


I suppose some attributes are not needed for inline functions, since the
compiler has the full function definition and can figure some things out
itself.  That would apply to "pure" and "const" functions, I expect.


I was going to agree, but then I tested it and found out that const
(and most likely pure) do actually make a difference on inline
functions.  For example in the test case below the inequality is
folded to false.

int f (int);

__attribute__ ((const))
int g (int i) { return f (i); }

void h (int i)
{
  if (g (i) != g (i))
    __builtin_abort ();
}

On the other hand, the equality below is not folded unless f() is
also declared with attribute malloc:

void* f (void);

static int a[1];

__attribute__ ((malloc))
void* g (void) { return f (); }

void h (void)
{
  if (g () == a)
    __builtin_abort ();
}

With heavy inlining (e.g., with LTO) whether a function attribute
will have an effect or not in a given caller is anyone's guess :(

And if you want a parameter to be non-null, it's possible to do a check
inside the function:

extern void __attribute__((error("Nonnull check failed")))
                nonnull_check_failed(void);

#define nonnull(x) \
     do { \
         if (__builtin_constant_p(!(x))) { \
             if (!(x)) nonnull_check_failed(); \
         } \
         if (!(x)) __builtin_unreachable(); \
     } while (0)


inline int foo(int *p) {
        nonnull(p);
        (*p)++;
        return *p;
}


(The "__builtin_unreachable()" line could also be a call to an error
handler, or missing entirely, according to need.)


If you try to call "foo(0)" and the compiler can see at compile time
that the parameter is null, you'll get a compile-time error.  I've used
that kind of check in my code, but it's a little uglier than
__attribute__((nonnull)) !


This is a possible workaround.  GCC should be able to do the same
thing for you (that's the __builtin_warning idea).

The new attribute malloc (to associate allocators with deallocators)
is also handled very late but it deals with the same problem by
disabling inlining.  This was done to avoid false positives, not
to prevent false negatives, but it works for both.  Disabling
inlining is obviously suboptimal and wouldn't be appropriate for
simple accessor functions but for memory allocation it seems like
an acceptable tradeoff.


I've sometimes had allocator functions that are so simple that inlining
is appropriate (on dedicated embedded systems it is not unusual to need
to allocate some memory early on in the code, but never need to free it,
leading to minimal allocation functions).  But the cost of a function
call would not be noticeable.


The inlining problem is not unique to attributes that affect
warnings.  It impacts all function (and function type) attributes,
including those that affect optimization.  Those that specifically
change optimization options disable inlining to avoid meaningless
mismatches between options controlling the codegen of the caller
and those intended to control the codegen for the callee.


It's obvious that some attributes don't play well with inlining, such as
"section" (unless inlined in a function with the same section
attribute), but it looks like there is a lot of detail that is missing
from the manual pages here.  And some of these effects are
counter-intuitive and unhelpful.


Agreed.  I hope my articles will be helpful here (it would be nice
if GCC had a blog where they could be posted too).  The manual
should be updated to mention these basic limitations; the challenge
is the inconsistency between how they're applied.


For example, it is very occasionally useful to arithmetic operations on
signed types with wrapping semantics rather than the usual "overflow
doesn't happen" semantics that lets gcc produce more efficient code.  A
neat and convenient way to write that in C++ would be to make an enum
class for wrapping ints:

enum class WInt : int {};

__attribute__((optimize("-fwrapv")))
WInt operator + (WInt x, WInt y) {
     return (WInt) ((int) x + (int) y);
}

__attribute__((optimize("-fwrapv")))
WInt operator - (WInt x, WInt y) {
     return (WInt) ((int) x - (int) y);
}

// etc.


Simple, clear, safe (you can't mix "int" and "WInt" by mistake) and
efficient - one might think.  But it turns out this is not the case -
using these operators from a function that does not also have "-fwrapv"
in effect will lead to function calls.

I'm glad I found out now, and not in a situation where inlining was
important.  But I think it would be a good thing to mention this in the
documentation.  (It would be even better to remove the restriction on
inlining, but I expect that will take more time!)


I agree (on both counts).  Raising bugs (ideally with test cases
showing counterintuitive effects) would help.

Martin

Re: Using source-level annotations to help GCC detect buffer overflows

Reply via email to