https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678
Bug ID: 113678
Summary: SLP misses up vec_concat
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: x86_64
Take:
```
void f(char *a, char *b)
{
int b0 = b[0];
int b1 = b[1];
int b2 = b[2];
int b3 = b[3];
int b4 = 0;
int b5 = 0;
int b6 = 0;
int b7 = 0;
a[0] = b0;
a[1] = b1;
a[2] = b2;
a[3] = b3;
#if 0
asm("":::"memory");
#endif
a[4] = b0;
a[5] = b1;
a[6] = b2;
a[7] = b3;
}
```
On x86_64 we get some mess because SLP decides to do this:
```
_1 = *b_6(D);
_2 = MEM[(char *)b_6(D) + 1B];
_3 = MEM[(char *)b_6(D) + 2B];
_4 = MEM[(char *)b_6(D) + 3B];
_16 = {_1, _2, _3, _4, _1, _2, _3, _4};
```
But this is could be done as 2 stores (if we change the `#if 0` to `#if 1` we
get the better code):
```
vect__1.5_18 = MEM <vector(4) char> [(char *)b_6(D)];
MEM <vector(4) char> [(char *)a_7(D)] = vect__1.5_18;
MEM <vector(4) char> [(char *)a_7(D) + 4B] = vect__1.5_18;
```
Or we could get one store even like LLVM gets:
```
movd xmm0, dword ptr [rsi] # xmm0 = mem[0],zero,zero,zero
pshufd xmm0, xmm0, 0 # xmm0 = xmm0[0,0,0,0]
movq qword ptr [rdi], xmm0
ret
```