Re: array bounds, sanitizer, safe programming, and cilk array notation

2015-02-21 Thread Marek Polacek
Sorry for late reply - I've found this in my inbox only today.

On Mon, Jan 26, 2015 at 11:53:59AM -0800, Martin Uecker wrote:
> 
> Hi all,
> 
> I am writing numerical code, so I am trying to make the use 
> of arrays in C (with gcc) suck a bit less. In general, the long term
> goal would be to have either a compile-time warning or the possibility
> to get a run-time error if one writes beyond the end of an array as 
> specified by its type.
> 
> So one example (see below) I looked at is where I pass an array of
> too small size to a function, to see how see can be diagnosed. In some
> cases, we can get a runtime error with the address sanitizer, but this
> is fairly limited (e.g. it does not work when the array is embedded
> in a struct) and I also got mixed results when the function
> is inlined.
> 
> For pointers to arrays with static size one can get an "incompatible
> pointer" warning - which is nice. With clang, I also get warning for 
> pointers which are declared as array parameters and use the 'static' 
> keyword to specify a minimum size. This a diagnostic we are currently
> missing.
> 
> The next step would be to have diagnostics also for the VLA
> case if the size is known at compilation time (as in the
> example below) and a run-time error when it is not (maybe 
> with the undefined behaviour sanitizer?).
> 
> If we have the later, I think this might also help with safer 
> programming in C, because one would get either a compile time or 
> runtime error when I passing a buffer which is too small to
> a function. For example, snprintf could have a prototype like
> this:
> 
> int snprintf(size_t size; char str[static size], size_t size, 
>   const char *format, ...);
> 
> That VLAs essentially provide the bounded pointer type C is
> missing has been pointed out before, e.g. there was a proposal
> by John Nagle, although he proposed rather intrusive language
> changes (e.g. adding references to C) which are not necessary
> in my opinion:
> 
> https://gcc.gnu.org/ml/gcc/2012-08/msg00360.html
> 
> 
> Finally, what is missing is a way to diagnose problems inside
> the called functions. -Warray-bounds=2 (with my recently
> accepted patch) helps with this, but - so far - only for static 
> arrays:
> 
> void foo(int (*x)[4])
> {
>   (*x)[4] = 5;// warning
> }
 
This is detected by -fsanitize=object-size, turned on by default in
-fsanitize=undefined.  Since it makes use of __builtin_object_size,
it is necessary to compile with optimizations turned on.

> It would be nice to also have these warning and runtime errors
> (with the undefined behaviour sanitizer) for VLAs. 
> 
> Finally, I think we should have corresponding warning also
> for pointers which are declared as array parameters:
> 
> void foo2(int x[4])
> {
>   x[4] = 5;
> }
 
Ditto.

> The later does not currently produce a warning, because x
> is converted to a pointer and the length is ignored. 
> 
> If it is not possible to have warning here for compatibility
> reasons, one possibility is to have an extension similar to 
> 'static' which makes 'x' a proper array in the callee, e.g. 

I think even the 'static in parameter array declarators' (ugly) C99 extension
isn't properly implemented yet (I don't think that the compiler makes
any optimization based on it).

So - if I understood this correctly - I think it's better to enhance
ubsan than to add any kind of language extensions.

Marek


Re: array bounds, sanitizer, safe programming, and cilk array notation

2015-02-21 Thread Marek Polacek
Sorry for late reply.

On Tue, Jan 27, 2015 at 12:07:58AM +, Joseph Myers wrote:
> On Mon, 26 Jan 2015, Martin Uecker wrote:
> 
> > extern void bar2(int (*x)[5]);
> 
> > int c = 4;
> > int y[c];
> 
> > bar2(&y);   // not diagnosed (found by asan)
> 
> This is the undefined behavior "If the two array types are used in a 
> context which requires them to be compatible, it is undefined behavior if 
> the two size specifiers evaluate to unequal values." (C11 6.7.6.2#6).  
> Yes, it would make sense for ubsan to detect this.  Generally, most forms 
> of runtime undefined behavior listed in J.2 should have ubsan detection 
> unless hard to detect / detected by some other sanitizer such as asan.
 
I have created a table to that effect some time ago:
http://people.redhat.com/mpolacek/www/analyzability.html
Obviously the question marks should be replaced by a -fsanitize=
option that detects a particular UB.  Or say that a particular UB is a
compile-time error (e.g. "declaring a function at block scope with an explicit
storage-class specifier other than extern").

I don't know what to do with the UBs on the library side - those 7.* ones.

> Does adding new forms of sanitization require upstream libsanitizer 
> changes as well or can arbitrary ubsan checks be added without needing 
> libsanitizer changes?

I think we also need libubsan changes.  But it is usually just about
printing an error message along with some values - nothing terribly
complex.

Marek


Re: array bounds, sanitizer, safe programming, and cilk array notation

2015-02-21 Thread Martin Uecker


Marek Polacek :

> Sorry for late reply - I've found this in my inbox only today.
> 
> On Mon, Jan 26, 2015 at 11:53:59AM -0800, Martin Uecker wrote:
>
> > Finally, what is missing is a way to diagnose problems inside
> > the called functions. -Warray-bounds=2 (with my recently
> > accepted patch) helps with this, but - so far - only for static 
> > arrays:
> > 
> > void foo(int (*x)[4])
> > {
> > (*x)[4] = 5;// warning
> > }
>  
> This is detected by -fsanitize=object-size, turned on by default in
> -fsanitize=undefined.  Since it makes use of __builtin_object_size,
> it is necessary to compile with optimizations turned on.

Not always.  -fsanitize=object-size  does not seem to detect the
illegal out-of-bound access by itself, but instead uses some very 
coarse notion of object size. For example, the following was not
detected in my tests:


void foo(int (*f)[4])
{
(*f)[4] = 1;// warning
}

void bar(int n, int (*f)[n])
{
(*f)[n] = 1;// no warning
}

...
struct { int buf[4]; int x; } s;
foo(&s.buf);
bar(4, &s.buf);



Also in cases like this a compile-time warning/error is possible.
A compile-time warning is more useful, and I think -fsanitize should
not be a replacement for proper compile time warnings/errors.

Instead, I think it should catch exactly those cases, which cannot
be detected a compile time (i.e. VLA with size not known at compile
time) - so that the combination of -fsanitize with compile time errors
gives strict checking without unnecessary runtime checks.

... 

> > The later does not currently produce a warning, because x
> > is converted to a pointer and the length is ignored. 
> > 
> > If it is not possible to have warning here for compatibility
> > reasons, one possibility is to have an extension similar to 
> > 'static' which makes 'x' a proper array in the callee, e.g. 
> 
> I think even the 'static in parameter array declarators' (ugly) C99 extension
> isn't properly implemented yet (I don't think that the compiler makes
> any optimization based on it).

This is correct, it doesn't do anything now.

(I am still puzzled why the C committee did not just make the
parameter have a proper array type whenever a size is specified.
So would have been so much better and is what most programmers
would expect.)

> So - if I understood this correctly - I think it's better to enhance
> ubsan than to add any kind of language extensions.

ubsan should be enhanced to detect the cases which cannot
be detected at compile time. But it needs be able to detect the
out-of-bounds access directly, and not just access outside
some object which might be much larger when the array
is embedded in a larger object. 

The 'static' keyword only garantuees that the pointer points
to an array of a certain minimum size. So this can be used
to check that arrays of the required size are passed to
a function (and clang already emits a warning at a compile
time when it detects this). But 'static' does not mean
that accesses after the specified size are out-of-bounds
(at they would be for proper arrays). To achieve this
a language extension would be required. 

Martin