https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116973

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note the load in question isn't lowered because of

      /* When the load permutation accesses a contiguous unpermuted,
         power-of-two aligned and sized chunk leave the load alone.
         We can likely (re-)load it more efficiently rather than
         extracting it from the larger load.
         ???  Long-term some of the lowering should move to where
         the vector types involved are fixed.  */
      if (ld_lanes_lanes == 0
          && contiguous
          && (SLP_TREE_LANES (load) > 1 || loads.size () == 1)
          && pow2p_hwi (SLP_TREE_LANES (load))
          && SLP_TREE_LOAD_PERMUTATION (load)[0] % SLP_TREE_LANES (load) == 0
          && group_lanes % SLP_TREE_LANES (load) == 0)
        {
          final_perm.release ();
          continue;
        }

which I added as part of r15-3442-g7164d982663738 that enables lowering
of single loads which specifically exempted some cases to avoid regressions.

gcc.dg/vect/slp-gap-1.c is the testcase that benefits from the above.

That case asks for lowering being more aware of gaps I guess, while we
now track those with NULL scalar stmt in the lanes the lowering code
doesn't track the "do-not-care" state of such lanes but it probably should.

Reply via email to