https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678
Bug ID: 113678 Summary: SLP misses up vec_concat Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: x86_64 Take: ``` void f(char *a, char *b) { int b0 = b[0]; int b1 = b[1]; int b2 = b[2]; int b3 = b[3]; int b4 = 0; int b5 = 0; int b6 = 0; int b7 = 0; a[0] = b0; a[1] = b1; a[2] = b2; a[3] = b3; #if 0 asm("":::"memory"); #endif a[4] = b0; a[5] = b1; a[6] = b2; a[7] = b3; } ``` On x86_64 we get some mess because SLP decides to do this: ``` _1 = *b_6(D); _2 = MEM[(char *)b_6(D) + 1B]; _3 = MEM[(char *)b_6(D) + 2B]; _4 = MEM[(char *)b_6(D) + 3B]; _16 = {_1, _2, _3, _4, _1, _2, _3, _4}; ``` But this is could be done as 2 stores (if we change the `#if 0` to `#if 1` we get the better code): ``` vect__1.5_18 = MEM <vector(4) char> [(char *)b_6(D)]; MEM <vector(4) char> [(char *)a_7(D)] = vect__1.5_18; MEM <vector(4) char> [(char *)a_7(D) + 4B] = vect__1.5_18; ``` Or we could get one store even like LLVM gets: ``` movd xmm0, dword ptr [rsi] # xmm0 = mem[0],zero,zero,zero pshufd xmm0, xmm0, 0 # xmm0 = xmm0[0,0,0,0] movq qword ptr [rdi], xmm0 ret ```