On Thu, 3 Apr 2025, Richard Biener wrote:

On Thu, Apr 3, 2025 at 2:23 AM Krister Walfridsson via Gcc
<gcc@gcc.gnu.org> wrote:

I have more questions about GIMPLE memory semantics for smtgcc.

As before, each section starts with a description of the semantics I've
implemented (or plan to implement), followed by concrete questions if
relevant. Let me know if the described semantics are incorrect or
incomplete.


Accessing memory
----------------
Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the
lhs and/or rhs is a memory reference expression (such as MEM_REF). When
both lhs and rhs access memory, one of the following must hold --
otherwise the access is UB:
  1. There is no overlap between lhs and rhs
  2. lhs and rhs represent the same address

A memory access is also UB in the following cases:
  * Any accessed byte is outside valid memory
  * The pointer violates the alignment requirements
  * The pointer provenance doesn't match the object
  * The type is incorrect from a TBAA perspective
  * It's a store to constant memory

correct.

Note that GIMPLE_CALL and GIMPLE_ASM can also access memory.

smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this
mail. Provenance has its own issues, which I'll come back to in a separate
mail.


Checking memory access is within bounds
---------------------------------------
A memory access may be represented by a chain of memory reference
expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example,
accessing a structure:

   struct s {
     int x, y;
   };

as:

   int foo (struct s * p)
   {
     int _3;

     <bb 2> :
     _3 = p_1(D)->x;
     return _3;
   }

involves a MEM_REF for the whole object and a COMPONENT_REF to select the
field. Conceptually, we load the entire structure and then pick out the
element -- so all bytes of the structure must be in valid memory.

We could also do the access as:

   int foo (struct s * p)
   {
     int * q;
     int _3;

     <bb 2> :
     q_2 = &p_1(D)->x;
     _3 = *q_2;
     return _3;
   }

This calculates the address of the element, and then reads it as an
integer, so only the four bytes of x must be in valid memory.

In other words, the compiler is not allowed to optimize:
   q_2 = &p_1(D)->x;
   _3 = *q_2;
to
   _3 = p_1(D)->x;

Correct.  The reason that p_1(D)->x is considered accessing the whole
object is because of TBAA, so with -fno-strict-aliasing there is no UB
when the whole object isn't accessible (but the subsetted piece is).

I'm still a bit confused... Assume we're using -fno-strict-aliasing and p points to a 4-byte buffer in the example above. Then:
  _3 = p_1(D)->x;
is valid. So that means my paragraph starting with "In other words..." must be incorrect? That is, the compiler is allowed to optimize:
  q_2 = &p_1(D)->x;
  _3 = *q_2;
to
  _3 = p_1(D)->x;
since it's equally valid. But this optimization would be invalid for -fstrict-aliasing?

And a memory access like:
  _4 = p_1(D)->y;
would be invalid (for both -fno-strict-aliasing and -fstrict-aliasing)? That is, the subsetted piece must be accessible?

   /Krister

Reply via email to