labrinea added a comment. > struct foo { unsigned long long x[8]; }; > void store(int *in, void *addr) > { > > struct foo x = { in[0], in[1], in[4], in[16], in[25], in[36], in[49], > in[64] }; > __asm__ volatile ("st64b %0,[%1]" : : "r" (x), "r" (addr) : "memory" ); > > }
For this particular example if we pass the asm operands as i512 the compiler generates the following, which doesn't look bad. ldpsw x2, x3, [x0] ldrsw x4, [x0, #16] ldrsw x5, [x0, #64] ldrsw x6, [x0, #100] ldrsw x7, [x0, #144] ldrsw x8, [x0, #196] ldrsw x9, [x0, #256] //APP st64b x2, [x1] //NO_APP Looking at the IR, it seems that SROA gets in the way. It loads all eight i32 values and constructs the i512 operand by performing bitwise operations on them. So I was wrong saying that the load of an i512 value won't get optimized. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D94098/new/ https://reviews.llvm.org/D94098 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits