O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

rguenth at gcc dot gnu.org Tue, 08 Dec 2015 01:13:58 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707


--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 36951
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36951&action=edit
patch for testing

Can ARM people please evaluate the attached?  It simply prefers load/store-lane
over SLP.  I'd like to know whether there are cases this is undesirable and
whether this patch causes some loops not to be vectorized at all (because I got
the load/store-lane supported test wrong).

Caveats may be that SLP may require no unrolling and load/store-lane always
does and thus with a statically known loop trip count the vectorization would
not be done with load/store-lanes.  Likewise the minimum required iterations
for the not-known case may cause the vectorized variant be skipped always
if the loop trip count is small in practice.  Likewise the extra peeling
required
for gaps may have the same effect (though with gaps the SLP variant will always
require eventually expensive permutes).

Thus caveats may apply mainly for low loop iteration counts (only decidable
at runtime in most cases).

The patch is a heuristic, possible improvements include looking at a
statically known loop trip count as well as at the actual permutation
required for SLP (may be none).  In the context of ARM load/store-lane
I know nothing about costs.

Eventually we should do the same for cases that regular interleaving
can handle if SLP requires permutations.

[Bug tree-optimization/68707] [6 Regression] testcase gcc.dg/vect/O3-pr36098.c vectorized using VEC_PERM_EXPR rather than VEC_LOAD_LANES

Reply via email to