https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414
Bug ID: 90414 Summary: [Feature] Implementing HWASAN (and eventually MTE) Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: matmal01 at gcc dot gnu.org CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org, ramana at gcc dot gnu.org, rearnsha at gcc dot gnu.org Target Milestone: --- Hello, I'm looking into how we can implement MTE in the compiler. A productive first step could be implementing HWASAN for GCC, which does a software implementation of MTE using the top-byte-ignore feature. This has already been implemented in LLVM and the design can be found at the link below. https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html Hopefully we can make this change in such a way that will enable the use of MTE in the future. I don't know the best approach here, and would appreciate any feedback. >From inspection it looks like most of the work is already handled by ASAN -- especially in finding all those places that need to be instrumented -- so I was looking into what modifications would need to be made from that starting point. I believe that tagging stack allocated memory can be done in a similar way to ASAN by expanding the equivalent of ASAN_MARK in a relevant manner. However, checking memory accesses seems to need a different approach to the current ASAN one with ASAN_CHECK. For both HWASAN and MTE we need to find the tag that a given memory access should be done through. In order to produce the best machine-code we would need to associate each stack variable with a tag internally. In the LLVM implementation this is done by generating a random tag for the current stack, and associating each stack variable with an increment from this tag. Also, for MTE the access itself needs to be made with a tagged pointer, which means the current method of adding instructions before a memory access can't be used and instead we need to modify the memory access itself. I have some very basic questions that I would appreciate any help in answering. 1) Where should such passes be put? I would guess that putting HWASAN and/or MTE passes in the same position as the ASAN passes and updating the SANOPT pass to handle any changes would be ok, but I don't have a good understanding of why they are in their current position. 2) Can we always find the base object that's being referenced from the gimple statement where memory is accessed or a pointer is created? If not, when is it problematic? Finding the base object is pretty fundamental to getting the tag for a pointer. It seems like this should be possible based on a reading of the documentation and looking at the TREE_CODEs that the current ASAN `instrument_derefs` function works on. (ARRAY_REF -> first operand is the array MEM_REF -> first operand is the base COMPONENT_REF -> first operand is the object INDIRECT_REF -> first operand is the pointer which should reference object VAR_DECL -> this is the object BIT_FIELD_REF -> first operand is the object) 3) Would there be any obvious difficulties with a transformation of the form: _4 = big_arrayD.3771[num_3(D)] TO _6 = &big_arrayD.3771[num_3(D)]; _7 = HWASAN_CHECK(6, _6, 4, 4); _4 = *_7; Instead of _4 = big_arrayD.3771[num_3(D)] TO _6 = &big_arrayD.3771[num_3(D)]; ASAN_CHECK(6, _6, 4, 4); _4 = big_arrayD.3771[num_3(D)] which is what ASAN currently does. This new form would enable using MTE by allowing the check to modify the pointer that the access will be made with (so it can have have its tag). 4) Builtin memory calls look like they could be handled with HWASAN in basically the same way as ASAN, while for MTE they should be fine once the pointers the calls are provided are tagged. Is there anything stopping that approach? Thanks, MM