https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115895

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
   Last reconfirmed|                            |2024-07-12
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
  <bb 3> [local count: 505088130]:
  # vectp_sp.25_67 = PHI <vectp_sp.25_68(5), sp_10(D)(2)>
  # vectp_mprr_2.31_75 = PHI <vectp_mprr_2.31_76(5), &mprr_2(2)>
  # ivtmp_79 = PHI <ivtmp_80(5), 15(2)>
  # loop_mask_69 = PHI <_82(5), { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1 }(2)>
  # loop_mask_77 = PHI <_84(5), { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1 }(2)>
  vect__5.27_70 = .MASK_LOAD (vectp_sp.25_67, 32B, loop_mask_69);
  vectp_sp.25_71 = vectp_sp.25_67 + 18446744073709551568;
  vect__5.28_72 = VEC_PERM_EXPR <vect__5.27_70, vect__5.27_70, { 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }>;
  vect__5.29_73 = VEC_PERM_EXPR <vect__5.27_70, vect__5.27_70, { 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 }>;
  vect__6.30_74 = VEC_PACK_TRUNC_EXPR <vect__5.28_72, vect__5.29_73>;
  .MASK_STORE (vectp_mprr_2.31_75, 512B, loop_mask_77, vect__6.30_74);

So indeed here the .MASK_LOAD overreads.  We are still peeling for gaps, but
not enough.

In theory we should be able to adjust the mask - that might be cheaper than
shortening the access, which is what we do without masking since masking off
the gap would also allow to avoid peeling for gaps(?)

  <bb 3> [local count: 441952113]:
  # vectp_sp.25_70 = PHI <vectp_sp.25_71(5), sp_10(D)(2)>
  # vectp_mprr_2.31_79 = PHI <vectp_mprr_2.31_80(5), &mprr_2(2)>
  # ivtmp_82 = PHI <ivtmp_83(5), 0(2)>
  _72 = MEM <uint128_t> [(int *)vectp_sp.25_70];
  _73 = {_72, 0, 0, 0};
  vect__5.27_74 = VIEW_CONVERT_EXPR<vector(16) int>(_73);

I'll leave this open for a bit - the easiest fix is to disable partial
vectors when we do the gap optimization and not perform it for variable-length
vectors (I'm not sure all the may/must checks are on the safe side).  I think
we want to use smaller vectors for the load in a more concious way, possibly
the SLP work of lowering vectors will help here.

Reply via email to