https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120893

            Bug ID: 120893
           Summary: SLP costing costs vec_construct while forwprop turns
                    it into vec_permute
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jchrist at linux dot ibm.com
  Target Milestone: ---

Created attachment 61766
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61766&action=edit
Test case

The attached test case fails to SLP vectorize on s390x due to unprofitable
costs.  The high costs come from two vec_construct operations that are created
for external nodes that combine two different vectors:

missed:   Build SLP failed: different BIT_FIELD_REF arguments in _25 =
BIT_FIELD_REF <b_50(D), 16, 0>;

Forwprop pass later replaces these vec_construct operations by vec_permute
operations which would be a lot cheaper and make vectorization profitable:

  _2 = VEC_PERM_EXPR <a_49(D), b_50(D), { 0, 2, 4, 6, 8, 10, 12, 14 }>;
  vect__2.3_54 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_2);
  _4 = VEC_PERM_EXPR <a_49(D), b_50(D), { 1, 3, 5, 7, 9, 11, 13, 15 }>;
  vect__4.4_56 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_4);

Instead of treating this special case inside a vector cost model, could this be
solved in SLP directly?

Reply via email to