labrinea added a comment.

> struct foo { unsigned long long x[8]; };
> void store(int *in, void *addr)
> {
>
>   struct foo x = { in[0], in[1], in[4], in[16], in[25], in[36], in[49], 
> in[64] };
>   __asm__ volatile ("st64b %0,[%1]" : : "r" (x), "r" (addr) : "memory" );
>
> }

For this particular example if we pass the asm operands as i512 the compiler 
generates the following, which doesn't look bad.

  ldpsw x2, x3, [x0]
  ldrsw x4, [x0, #16]
  ldrsw x5, [x0, #64]
  ldrsw x6, [x0, #100]
  ldrsw x7, [x0, #144]
  ldrsw x8, [x0, #196]
  ldrsw x9, [x0, #256]
  //APP
  st64b x2, [x1]
  //NO_APP

Looking at the IR, it seems that SROA gets in the way. It loads all eight i32 
values and constructs the i512 operand by performing bitwise operations on 
them. So I was wrong saying that the load of an i512 value won't get optimized.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94098/new/

https://reviews.llvm.org/D94098

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to