iii-i wrote:

Sorry, I my wording was not precise enough, it is indeed important that we 
create a copy, and not pass a pointer to the original. Still, what you 
described matches the s390x ABI:

```
1.2.2.3. Parameter Area

The parameter area shall be allocated by a calling function if some parameters 
cannot
be passed in registers, but must be passed on the stack instead (see section 
1.2.3).

[...]

1.2.3. Parameter Passing

[...]
A struct or union of any other size, a complex type, an __int128, a long
double, a _Decimal128, or a vector whose size exceeds 16 bytes. Replace
such an argument by a pointer to the object, or to a copy where necessary
to enforce call-by-value semantics. Only if the caller can ascertain that the
object is “constant” can it pass a pointer to the object itself.
```

---

Ah, that's the source of my confusion. I didn't realize the call instruction 
had to make a copy, I thought it just had to be done somewhere. "The attribute 
implies that a hidden copy of the pointee is made between the caller and the 
callee" actually does mean the former, but one has to squint to see that. The 
way you phrased it is much clearer.

So in the following example:

```
struct foo { char x[800]; };
void bar(struct foo);
void baz(void) { struct foo f = {}; bar(f); };
```

x84_64 generates:

```
define dso_local void @baz() #0 {
  %1 = alloca %struct.foo, align 8
  call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 800, i1 false)
  call void @bar(ptr noundef byval(%struct.foo) align 8 %1)
  ret void
}
```

and relies on the backend to expand `call void @bar` into roughly 
`REP_MOVSQ_64` and `CALL64pcrel32`. Whereas on s390x we get:

```
define dso_local void @baz() #0 {
  %1 = alloca %struct.foo, align 1
  %2 = alloca %struct.foo, align 1
  call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 800, i1 false)
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %2, ptr align 1 %1, i64 800, i1 
false)
  call void @bar(ptr noundef %2)
  ret void
}
```

so the creation of the copy is explicit in the LLVM IR. Even though the ABIs 
are saying roughly the same thing, it's implemented differently.

I wonder if it would still be beneficial to switch s390x to byval? I think it 
can be done in a way that correctly implements the ABI, even though it would of 
course be more complex than this PR. An obvious benefit is that s390x would 
become more similar to x86_64, but maybe there are some drawbacks that I'm not 
seeing.

---

I revisited MSan's `param_tls_limit.cpp`, and the XFAIL is actually fine. The 
instrumentation does indeed put the shadow of the synthetic pointer into the 
parameters' TLS area on s390x, but this is not a problem, since the shadow of 
the actual value is still preserved and checked. This prevents the overflow, 
which the test expects, from happening, so the conclusion that the test is not 
applicable is correct. Sorry for the noise.

I will check if there is a different solution for DFSan. It currently passes 
the label of the pointer to the copy, which is always 0, instead of the label 
of the actual value, to vararg functions on s390x. Even though this is similar 
to what MSan does, the difference is that the DFSan runtime (e.g., 
`format_buffer`) expects the label of the actual value, regardless of the ABI.

https://github.com/llvm/llvm-project/pull/66404
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to