Hey Sarah,
Many array bounds and format string problems can already be found, especially
with LTO, ClooG, loop-unrolling, and -O3 enabled. Seeing across object-file
boundaries, understanding loop boundaries, and aggressive inlining allows GCC
to warn about a lot of real-world vulnerabilities. When multiple IPA passes
lands in trunk, it should be even better.
What I think is missing is:
1) detection of double-free. This is already a function attribute called
'malloc', which is used to express a specific kind of allocation function whose
return value will never be aliased. You could use that attribute, in addition
to a new one ('free'), to track potential double-frees of values via VRP/IPA.
2) the ability to annotate functions as to the taint and filtering side-effects
to their parameters, like the format() attribute. (I've asked for this feature
from the PC-Lint people for some time.) You could make this even more generic
and just add a new attribute that allows for tagging and checking of arbitrary
tags:
ssize_t recv(int sockfd, void *buf, size_t len, int flags) __attribute__
((add_parameter_tag ("taint", 2)))
__attribute__
((add_return_value_tag ("taint")));
int count_sql_rows_for(const char* name) __attribute__ ((disallow_parameter_tag
("taint", 1)));
void filter_sql_characters_from(const char* name) __attribute__
((removes_parameter_tag ("taint", 1)));
then a program like this:
int main(void) {
char name[20] = {0};
recv(GLOBAL_SOCKET, &name, sizeof(name), 0);
filter_sql_characters_from(name); // comment this line to get warning
count_sql_rows_for(name);
}
When I wrote my binary static analysis product, BugScan, we assumed that if a
pointer was tainted, so was its contents. (This was especially a necessity for
collections like lists and vectors in Java and C++ binaries.) You may want to
get more explicit with that, by having a rescurively_add_parameter_tag() or
somesuch that only applies to pointer parameters.
3) lack of explicit NULL-termination of strings. This one gets really
complicated, especially for situations where they are terminated properly and
then become un-terminated.
4) if a loop that writes to a pointer, and increments that pointer, is bound by
a tainted value. You'd have to add an extension to the loop unroller for that,
and just check for the 'taint' tag on the bounds check.
Of course, you still run into temporal ordering issues, especially with
globals, where the CFG ordering won't help.
But don't let that discourage you -- it would be great work to see done and
commoditized, and would probably be better than most commercial analyzers as
well ;)
Let me know if you need any more of my expertise in this area. I can't speak
for GCC internals, though.