[PATCH] D32199: [TBAASan] A TBAA Sanitizer (Clang)

Hal Finkel via Phabricator via cfe-commits Thu, 20 Apr 2017 08:39:45 -0700

hfinkel added a comment.

In https://reviews.llvm.org/D32199#731472, @rsmith wrote:


> > As I've currently implemented it, both reads and writes set the type of 
> > previously-unknown storage, and after that it says fixed (unless you memcpy 
> > to it, memset it, or its lifetime ends (the type gets reset on 
> > lifetime.start/end and for malloc/allocas/etc.). There's a flag to enable 
> > the "writes always set the type" rule, but that's not the default. Is this 
> > too strict?
>
> That seems like it will have at least three flavors of false positive:
>
> 1. C's "effective type" rule allows writes to set the type pretty much 
> unconditionally, unless the storage is for a variable with a declared type


To come back to this point: We don't really implement these rules now, and it 
is not clear that we will. The problem here is that, if we take the 
specification literally, then we can't use our current TBAA at all. The problem 
is that if we have:

  write x, !tbaa "int"
  read x, !tbaa "int"
  write x, !tbaa "float"

TBAA will currently tell us that the "float" write aliases with neither the 
preceding read nor the preceding write. As a result, we might move the "float" 
write to before the read (which is wrong) or before the write (also wrong). It 
seems to me that following the effective-type rules strictly will require that 
we only emit TBAA for memory accesses we can prove are to declared variables 
(as these are the only ones whose types don't get changed just by writing to 
them). We could certainly do that (*), although it is going to make TBAA 
awfully limited. As @dberlin has asserted, GCC does not implement these rules 
either.

To be fair, there are inferences we might draw from TBAA on all access that are 
not incorrect. For example, if we have:

  write x, !tbaa "int"
  write y, !tbaa "float"
  read x, !tbaa "int"

and we can indeed conclude that the write to y and the read from x don't alias 
(because the write happens before the read). This is because the effective type 
of y must be float after the write and so we know that the read from x must be 
accessing a different object. We can also conclude that the writes don't alias, 
but only because of the later read. The sad part is that if we use this 
information to reorder the read before the write to y (which we might do to 
eliminate the read), we now lose our ability to use TBAA to tell us anything.

Also, a strict reading of C's access rules seems to rule out the premise 
underlying our struct-path TBAA entirely. So long as I'm accessing a value 
using a struct that has some member, including recursively, with that type, 
then it's fine. The matching of the relative offsets is a sufficient, but not 
necessary, condition for well-defined access. C++ has essentially the same 
language (and, thus, potentially the same problem).

While I'd like the sanitizer to follow the rules, that's really useful only to 
the extent that the compilers follow the rules. If the compilers are making 
stronger assumptions, then I think that the sanitizer should also. OTOH, maybe 
we should change our TBAA representation/implementation to actually follow the 
rules, and then have a sanitizer that does the same.

(*) The best way I can think of to do this is to tag globals and allocas with 
tbaa according to their declared type, something similar for function 
arguments, and then in TBAA, instead of comparing the TBAA metadata of both 
operands directly, we call getUnderlyingObjects on the pointers, get the 
corresponding most-generic TBAA from the objects themselves, and then compare 
that TBAA to the TBAA from the other access.

> 2. After a placement new in C++, you should be able to use the storage as a 
> new type
> 3. Storing directly to a member access on a union (ie, with the syntax `x.a = 
> b`) in C++ permits using the storage as the new type

Yes, although for the sake of discussion, this is only true if a is a 
"non-class, non-array type, or of a class type with a trivial default 
constructor that is not
deleted, or an array of such types." That seems potentially useful.

> If we want to follow the relevant language rules by default, that would 
> suggest that "writes always set the type" should be enabled by default in C 
> and disabled by default in C++. That may not be the right decision for other 
> reasons, though. In C++, writes through union members and new-expressions 
> should probably (re)set the type (do you have intrinsics the frontend can use 
> to do so?).

Also, in C, memcpy gets to copy the type for a destination that does not have 
declared types. It looks like the same is true for C++ for trivially-copyable 
types. Is the first read/write sets the unknown type (i.e. memory from 
malloc/calloc/memset, etc.) correct for C++ also? I recall discussing something 
along these lines in Kona.


https://reviews.llvm.org/D32199



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D32199: [TBAASan] A TBAA Sanitizer (Clang)

Reply via email to