labrinea added a comment.
> struct foo { unsigned long long x[8]; };
> void store(int *in, void *addr)
> {
>
> struct foo x = { in[0], in[1], in[4], in[16], in[25], in[36], in[49],
> in[64] };
> __asm__ volatile ("st64b %0,[%1]" : : "r" (x), "r" (addr) : "memory" );
>
> }
For this particular example if we pass the asm operands as i512 the compiler
generates the following, which doesn't look bad.
ldpsw x2, x3, [x0]
ldrsw x4, [x0, #16]
ldrsw x5, [x0, #64]
ldrsw x6, [x0, #100]
ldrsw x7, [x0, #144]
ldrsw x8, [x0, #196]
ldrsw x9, [x0, #256]
//APP
st64b x2, [x1]
//NO_APP
Looking at the IR, it seems that SROA gets in the way. It loads all eight i32
values and constructs the i512 operand by performing bitwise operations on
them. So I was wrong saying that the load of an i512 value won't get optimized.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94098/new/
https://reviews.llvm.org/D94098
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits