https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181

--- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> The issue is we detect this as a single interleaving group:
> 
> t.c:12:1: note:   Detected interleaving load of size 264
> t.c:12:1: note:         _1 = *a_26(D);
> t.c:12:1: note:         _5 = MEM[(double *)a_26(D) + 8B]; 
> t.c:12:1: note:         _7 = MEM[(double *)a_26(D) + 16B];
> t.c:12:1: note:         _11 = MEM[(double *)a_26(D) + 24B];
> t.c:12:1: note:         _14 = MEM[(double *)a_26(D) + 32B];
> t.c:12:1: note:         _17 = MEM[(double *)a_26(D) + 40B];
> t.c:12:1: note:         _19 = MEM[(double *)a_26(D) + 48B];
> t.c:12:1: note:         _22 = MEM[(double *)a_26(D) + 56B];
> t.c:12:1: note:         <gap of 248 elements>
> t.c:12:1: note:         _2 = MEM[(double *)a_26(D) + 2048B];
> t.c:12:1: note:         _4 = MEM[(double *)a_26(D) + 2056B];
> t.c:12:1: note:         _8 = MEM[(double *)a_26(D) + 2064B];
> t.c:12:1: note:         _10 = MEM[(double *)a_26(D) + 2072B];
> t.c:12:1: note:         _13 = MEM[(double *)a_26(D) + 2080B];
> t.c:12:1: note:         _16 = MEM[(double *)a_26(D) + 2088B];
> t.c:12:1: note:         _20 = MEM[(double *)a_26(D) + 2096B];
> t.c:12:1: note:         _23 = MEM[(double *)a_26(D) + 2104B];
> 
> so the heuristic to swap operands to get a single group in leafs doesn't
> work.  Instead you get offsetting costs to avoid runaway with very large
> gaps:
Thanks for pointing this.
> 
> *a_26(D) 132 times unaligned_load (misalign -1) costs 1584 in body
> 
> and that makes it unprofitable.
> 
> There is indeed some better heuristic needed where to split groups - gaps
> bigger than the biggest vector size might be a good candidate.  Note
> when two different interleaving groups are used in the same SLP leaf
> we fail as we don't support that yet.

A simple hack like below works, But I guess we may need better heuristic.

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index c9395e33fcd..d9d55ff4a3e 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3567,6 +3567,12 @@ vect_analyze_data_ref_accesses (vec_info *vinfo,
                      && init_a <= init_prev
                      && init_prev <= init_b);

+         tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (DR_REF
(dra)));
+         unsigned HOST_WIDE_INT vf;
+         if (vectype
+             && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&vf)
+             && (unsigned HOST_WIDE_INT)(init_b - init_a) > vf * tree_to_uhwi
(sza))
+           break;
          /* Do not place the same access in the interleaving chain twice.  */
          if (init_b == init_prev)
            {

Reply via email to