https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122028
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2025-09-22
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Keywords| |missed-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Unfortunately we rely on quite early lowering of load permutations to implement
interleaving (or load/store-lane), so delaying this decision is difficult.
There is also a cut-off in data ref analysis:
/* For datarefs with big gap, it's better to split them into
different
groups.
.i.e a[0], a[1], a[2], .. a[7], a[100], a[101],..., a[107] */
if ((unsigned HOST_WIDE_INT)(init_b - init_prev)
> MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)
break;
and a fallback in get_load_store_type:
/* If this is single-element interleaving with an element
distance that leaves unused vector loads around fall back
to elementwise access if possible - we otherwise least
create very sub-optimal code in that case (and
blow up memory, see PR65518). */
if (loop_vinfo
&& single_element_p
&& (*memory_access_type == VMAT_CONTIGUOUS
|| *memory_access_type == VMAT_CONTIGUOUS_REVERSE)
&& maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
{
if (SLP_TREE_LANES (slp_node) == 1)
{
*memory_access_type = VMAT_ELEMENTWISE;
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"single-element interleaving not supported "
"for not adjacent vector loads, using "
"elementwise access\n");
}
else
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"single-element interleaving not supported "
"for not adjacent vector loads\n");
return false;
But what you say is basically that we use an unnecessarily high VF here.
So instead of running into the above a way would be to set max_vf based on
the constant niter and then reject single-element interleaving because of
it's high required VF.