https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181
--- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Richard Biener from comment #7) > The issue is we detect this as a single interleaving group: > > t.c:12:1: note: Detected interleaving load of size 264 > t.c:12:1: note: _1 = *a_26(D); > t.c:12:1: note: _5 = MEM[(double *)a_26(D) + 8B]; > t.c:12:1: note: _7 = MEM[(double *)a_26(D) + 16B]; > t.c:12:1: note: _11 = MEM[(double *)a_26(D) + 24B]; > t.c:12:1: note: _14 = MEM[(double *)a_26(D) + 32B]; > t.c:12:1: note: _17 = MEM[(double *)a_26(D) + 40B]; > t.c:12:1: note: _19 = MEM[(double *)a_26(D) + 48B]; > t.c:12:1: note: _22 = MEM[(double *)a_26(D) + 56B]; > t.c:12:1: note: <gap of 248 elements> > t.c:12:1: note: _2 = MEM[(double *)a_26(D) + 2048B]; > t.c:12:1: note: _4 = MEM[(double *)a_26(D) + 2056B]; > t.c:12:1: note: _8 = MEM[(double *)a_26(D) + 2064B]; > t.c:12:1: note: _10 = MEM[(double *)a_26(D) + 2072B]; > t.c:12:1: note: _13 = MEM[(double *)a_26(D) + 2080B]; > t.c:12:1: note: _16 = MEM[(double *)a_26(D) + 2088B]; > t.c:12:1: note: _20 = MEM[(double *)a_26(D) + 2096B]; > t.c:12:1: note: _23 = MEM[(double *)a_26(D) + 2104B]; > > so the heuristic to swap operands to get a single group in leafs doesn't > work. Instead you get offsetting costs to avoid runaway with very large > gaps: Thanks for pointing this. > > *a_26(D) 132 times unaligned_load (misalign -1) costs 1584 in body > > and that makes it unprofitable. > > There is indeed some better heuristic needed where to split groups - gaps > bigger than the biggest vector size might be a good candidate. Note > when two different interleaving groups are used in the same SLP leaf > we fail as we don't support that yet. A simple hack like below works, But I guess we may need better heuristic. diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index c9395e33fcd..d9d55ff4a3e 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -3567,6 +3567,12 @@ vect_analyze_data_ref_accesses (vec_info *vinfo, && init_a <= init_prev && init_prev <= init_b); + tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (DR_REF (dra))); + unsigned HOST_WIDE_INT vf; + if (vectype + && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&vf) + && (unsigned HOST_WIDE_INT)(init_b - init_a) > vf * tree_to_uhwi (sza)) + break; /* Do not place the same access in the interleaving chain twice. */ if (init_b == init_prev) {