https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414

            Bug ID: 90414
           Summary: [Feature] Implementing HWASAN (and eventually MTE)
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: sanitizer
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matmal01 at gcc dot gnu.org
                CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
                    jakub at gcc dot gnu.org, kcc at gcc dot gnu.org,
                    marxin at gcc dot gnu.org, ramana at gcc dot gnu.org,
                    rearnsha at gcc dot gnu.org
  Target Milestone: ---

Hello,

I'm looking into how we can implement MTE in the compiler.
A productive first step could be implementing HWASAN for GCC, which does a
software implementation of MTE using the top-byte-ignore feature.

This has already been implemented in LLVM and the design can be found at the
link below.
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html


Hopefully we can make this change in such a way that will enable the use of MTE
in the future.


I don't know the best approach here, and would appreciate any feedback.
>From inspection it looks like most of the work is already handled by ASAN --
especially in finding all those places that need to be instrumented -- so I was
looking into what modifications would need to be made from that starting point.


I believe that tagging stack allocated memory can be done in a similar way to
ASAN by expanding the equivalent of ASAN_MARK in a relevant manner.

However, checking memory accesses seems to need a different approach to the
current ASAN one with ASAN_CHECK.

For both HWASAN and MTE we need to find the tag that a given memory access
should be done through.
In order to produce the best machine-code we would need to associate each stack
variable with a tag internally.
In the LLVM implementation this is done by generating a random tag for the
current stack, and associating each stack variable with an increment from this
tag.

Also, for MTE the access itself needs to be made with a tagged pointer, which
means the current method of adding instructions before a memory access can't be
used and instead we need to modify the memory access itself.


I have some very basic questions that I would appreciate any help in answering.

1) Where should such passes be put?
   I would guess that putting HWASAN and/or MTE passes in the same position as
   the ASAN passes and updating the SANOPT pass to handle any changes would be
   ok, but I don't have a good understanding of why they are in their current
   position.

2) Can we always find the base object that's being referenced from the gimple
   statement where memory is accessed or a pointer is created?
   If not, when is it problematic?
   Finding the base object is pretty fundamental to getting the tag for a
   pointer.
   It seems like this should be possible based on a reading of the
documentation
   and looking at the TREE_CODEs that the current ASAN `instrument_derefs`
   function works on.

   (ARRAY_REF     -> first operand is the array
    MEM_REF       -> first operand is the base
    COMPONENT_REF -> first operand is the object
    INDIRECT_REF  -> first operand is the pointer which should reference object
    VAR_DECL      -> this is the object
    BIT_FIELD_REF -> first operand is the object)

3) Would there be any obvious difficulties with a transformation of the form:
      _4 = big_arrayD.3771[num_3(D)]

      TO

      _6 = &big_arrayD.3771[num_3(D)];
      _7 = HWASAN_CHECK(6, _6, 4, 4);
      _4 = *_7;

   Instead of
      _4 = big_arrayD.3771[num_3(D)]

      TO

      _6 = &big_arrayD.3771[num_3(D)];
      ASAN_CHECK(6, _6, 4, 4);
      _4 = big_arrayD.3771[num_3(D)]

   which is what ASAN currently does.
   This new form would enable using MTE by allowing the check to modify the
   pointer that the access will be made with (so it can have have its tag).

4) Builtin memory calls look like they could be handled with HWASAN in
basically
   the same way as ASAN, while for MTE they should be fine once the pointers
the
   calls are provided are tagged.
   Is there anything stopping that approach?



Thanks,
MM

Reply via email to