https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414
Bug ID: 90414
Summary: [Feature] Implementing HWASAN (and eventually MTE)
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: sanitizer
Assignee: unassigned at gcc dot gnu.org
Reporter: matmal01 at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org,
marxin at gcc dot gnu.org, ramana at gcc dot gnu.org,
rearnsha at gcc dot gnu.org
Target Milestone: ---
Hello,
I'm looking into how we can implement MTE in the compiler.
A productive first step could be implementing HWASAN for GCC, which does a
software implementation of MTE using the top-byte-ignore feature.
This has already been implemented in LLVM and the design can be found at the
link below.
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
Hopefully we can make this change in such a way that will enable the use of MTE
in the future.
I don't know the best approach here, and would appreciate any feedback.
>From inspection it looks like most of the work is already handled by ASAN --
especially in finding all those places that need to be instrumented -- so I was
looking into what modifications would need to be made from that starting point.
I believe that tagging stack allocated memory can be done in a similar way to
ASAN by expanding the equivalent of ASAN_MARK in a relevant manner.
However, checking memory accesses seems to need a different approach to the
current ASAN one with ASAN_CHECK.
For both HWASAN and MTE we need to find the tag that a given memory access
should be done through.
In order to produce the best machine-code we would need to associate each stack
variable with a tag internally.
In the LLVM implementation this is done by generating a random tag for the
current stack, and associating each stack variable with an increment from this
tag.
Also, for MTE the access itself needs to be made with a tagged pointer, which
means the current method of adding instructions before a memory access can't be
used and instead we need to modify the memory access itself.
I have some very basic questions that I would appreciate any help in answering.
1) Where should such passes be put?
I would guess that putting HWASAN and/or MTE passes in the same position as
the ASAN passes and updating the SANOPT pass to handle any changes would be
ok, but I don't have a good understanding of why they are in their current
position.
2) Can we always find the base object that's being referenced from the gimple
statement where memory is accessed or a pointer is created?
If not, when is it problematic?
Finding the base object is pretty fundamental to getting the tag for a
pointer.
It seems like this should be possible based on a reading of the
documentation
and looking at the TREE_CODEs that the current ASAN `instrument_derefs`
function works on.
(ARRAY_REF -> first operand is the array
MEM_REF -> first operand is the base
COMPONENT_REF -> first operand is the object
INDIRECT_REF -> first operand is the pointer which should reference object
VAR_DECL -> this is the object
BIT_FIELD_REF -> first operand is the object)
3) Would there be any obvious difficulties with a transformation of the form:
_4 = big_arrayD.3771[num_3(D)]
TO
_6 = &big_arrayD.3771[num_3(D)];
_7 = HWASAN_CHECK(6, _6, 4, 4);
_4 = *_7;
Instead of
_4 = big_arrayD.3771[num_3(D)]
TO
_6 = &big_arrayD.3771[num_3(D)];
ASAN_CHECK(6, _6, 4, 4);
_4 = big_arrayD.3771[num_3(D)]
which is what ASAN currently does.
This new form would enable using MTE by allowing the check to modify the
pointer that the access will be made with (so it can have have its tag).
4) Builtin memory calls look like they could be handled with HWASAN in
basically
the same way as ASAN, while for MTE they should be fine once the pointers
the
calls are provided are tagged.
Is there anything stopping that approach?
Thanks,
MM