https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103116

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
We could make peeling for gaps handle this by making it not a flag but indicate
the number of vector(!?) iterations we need to peel.  We're doing the "correct"
thing in adjusting the IV increment via

          if (slp_perm
              && (group_size != scalar_lanes
                  || !multiple_p (nunits, group_size)))
            {
              /* We don't yet generate such SLP_TREE_LOAD_PERMUTATIONs for
                 variable VF; see vect_transform_slp_perm_load.  */
              unsigned int const_vf = vf.to_constant ();
              unsigned int const_nunits = nunits.to_constant ();
              vec_num = CEIL (group_size * const_vf, const_nunits);
              group_gap_adj = vf * group_size - nunits * vec_num;

The problem also shows up for loops like

  for (int i = 0; i < COUNT; ++i)
    {
      x[i * 4] = y[i * 3] + 1;
      x[i * 4 + 1] = y[i * 3] + 2;
      x[i * 4 + 2] = y[i * 3 + 1] + 3;
      x[i * 4 + 3] = y[i * 3 + 2] + 4;
    }

where we cannot use a smaller vector type.

We could also use masked loads if available of course (not sure about the
cost of that vs peeling for gaps).

A conservative fix would be to detect when peeling for gaps as implemented
is good enough and do that and otherwise reject vectorization.

Reply via email to