https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668
Bug ID: 101668 Summary: vectorizer doesn't categorize vector construct cost right. Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- cat test.c typedef int v16si __attribute__((vector_size (64))); typedef long long v8di __attribute__((vector_size (64))); void bar_s32_s64 (v8di * dst, v16si src) { long long tem[8]; tem[0] = src[0]; tem[1] = src[1]; tem[2] = src[2]; tem[3] = src[3]; tem[4] = src[4]; tem[5] = src[5]; tem[6] = src[6]; tem[7] = src[7]; dst[0] = *(v8di *) tem; } gcc -O3 -march=skylake-avx512 will fail to vectorize the case after my r12-2549 because i've increased vec_construct cost for SKX/CLX. Here's dump for slp2 <bb 2> [local count: 1073741824]: _1 = BIT_FIELD_REF <src_18(D), 32, 0>; _2 = (long long int) _1; _3 = BIT_FIELD_REF <src_18(D), 32, 32>; _4 = (long long int) _3; _5 = BIT_FIELD_REF <src_18(D), 32, 64>; _6 = (long long int) _5; _7 = BIT_FIELD_REF <src_18(D), 32, 96>; _8 = (long long int) _7; _9 = BIT_FIELD_REF <src_18(D), 32, 128>; _10 = (long long int) _9; _11 = BIT_FIELD_REF <src_18(D), 32, 160>; _12 = (long long int) _11; _13 = BIT_FIELD_REF <src_18(D), 32, 192>; _14 = (long long int) _13; _15 = BIT_FIELD_REF <src_18(D), 32, 224>; _31 = {_1, _3, _5, _7, _9, _11, _13, _15}; vect__2.4_32 = (vector(8) long long int) _31; _16 = (long long int) _15; MEM <vector(8) long long int> [(long long int *)&tem] = vect__2.4_32; _17 = MEM[(v8di *)&tem]; *dst_28(D) = _17; tem ={v} {CLOBBER}; return; But actually, there's no need for vec_contruct from each element, it will be optimized to <bb 2> [local count: 1073741824]: _2 = BIT_FIELD_REF <src_18(D), 256, 0>; vect__2.4_32 = (vector(8) long long int) _2; *dst_28(D) = vect__2.4_32; return; So at the time slp2 can realize the optimization and categorize vec_contruct cost more accurately, we can avoid this regression.