On Thu, 3 Apr 2025, Richard Biener wrote:
On Thu, Apr 3, 2025 at 2:23 AM Krister Walfridsson via Gcc
<gcc@gcc.gnu.org> wrote:
I have more questions about GIMPLE memory semantics for smtgcc.
As before, each section starts with a description of the semantics I've
implemented (or plan to implement), followed by concrete questions if
relevant. Let me know if the described semantics are incorrect or
incomplete.
Accessing memory
----------------
Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the
lhs and/or rhs is a memory reference expression (such as MEM_REF). When
both lhs and rhs access memory, one of the following must hold --
otherwise the access is UB:
1. There is no overlap between lhs and rhs
2. lhs and rhs represent the same address
A memory access is also UB in the following cases:
* Any accessed byte is outside valid memory
* The pointer violates the alignment requirements
* The pointer provenance doesn't match the object
* The type is incorrect from a TBAA perspective
* It's a store to constant memory
correct.
Note that GIMPLE_CALL and GIMPLE_ASM can also access memory.
smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this
mail. Provenance has its own issues, which I'll come back to in a separate
mail.
Checking memory access is within bounds
---------------------------------------
A memory access may be represented by a chain of memory reference
expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example,
accessing a structure:
struct s {
int x, y;
};
as:
int foo (struct s * p)
{
int _3;
<bb 2> :
_3 = p_1(D)->x;
return _3;
}
involves a MEM_REF for the whole object and a COMPONENT_REF to select the
field. Conceptually, we load the entire structure and then pick out the
element -- so all bytes of the structure must be in valid memory.
We could also do the access as:
int foo (struct s * p)
{
int * q;
int _3;
<bb 2> :
q_2 = &p_1(D)->x;
_3 = *q_2;
return _3;
}
This calculates the address of the element, and then reads it as an
integer, so only the four bytes of x must be in valid memory.
In other words, the compiler is not allowed to optimize:
q_2 = &p_1(D)->x;
_3 = *q_2;
to
_3 = p_1(D)->x;
Correct. The reason that p_1(D)->x is considered accessing the whole
object is because of TBAA, so with -fno-strict-aliasing there is no UB
when the whole object isn't accessible (but the subsetted piece is).
I'm still a bit confused... Assume we're using -fno-strict-aliasing and p
points to a 4-byte buffer in the example above. Then:
_3 = p_1(D)->x;
is valid. So that means my paragraph starting with "In other words..."
must be incorrect? That is, the compiler is allowed to optimize:
q_2 = &p_1(D)->x;
_3 = *q_2;
to
_3 = p_1(D)->x;
since it's equally valid. But this optimization would be invalid for
-fstrict-aliasing?
And a memory access like:
_4 = p_1(D)->y;
would be invalid (for both -fno-strict-aliasing and -fstrict-aliasing)?
That is, the subsetted piece must be accessible?
/Krister