https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120893
Bug ID: 120893 Summary: SLP costing costs vec_construct while forwprop turns it into vec_permute Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jchrist at linux dot ibm.com Target Milestone: --- Created attachment 61766 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61766&action=edit Test case The attached test case fails to SLP vectorize on s390x due to unprofitable costs. The high costs come from two vec_construct operations that are created for external nodes that combine two different vectors: missed: Build SLP failed: different BIT_FIELD_REF arguments in _25 = BIT_FIELD_REF <b_50(D), 16, 0>; Forwprop pass later replaces these vec_construct operations by vec_permute operations which would be a lot cheaper and make vectorization profitable: _2 = VEC_PERM_EXPR <a_49(D), b_50(D), { 0, 2, 4, 6, 8, 10, 12, 14 }>; vect__2.3_54 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_2); _4 = VEC_PERM_EXPR <a_49(D), b_50(D), { 1, 3, 5, 7, 9, 11, 13, 15 }>; vect__4.4_56 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_4); Instead of treating this special case inside a vector cost model, could this be solved in SLP directly?