https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2019-08-28
Blocks| |53947
Summary|Vectorization failure for a |Vectorization failure for a
|loop to do multiply-add |loop to do multiply-add
| |because SLP loads
| |unnecessarily require
| |permutation
Ever confirmed|0 |1
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Huh.
(compute_affine_dependence
stmt_a: MEM[(char *)ptr_dst_178 + 6B] = _189;
stmt_b: MEM[(char *)ptr_dst_178 + 7B] = _211;
(analyze_overlapping_iterations
(chrec_a = {6B, +, _279}_1)
(chrec_b = {7B, +, _279}_1)
(analyze_siv_subscript
siv test failed: unimplemented)
(overlap_iterations_a = not known)
(overlap_iterations_b = not known))
) -> dependence analysis failed
Ah, I guess since _279 is unknown.
Anyway, the issue for SLP is
t.c:10:5: note: Load permutation 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8
t.c:10:5: missed: unsupported vect permute { 1 2 3 4 5 6 7 8 17 18 19 20 21
22 23 24 }
t.c:10:5: missed: Build SLP failed: unsupported load permutation *ptr_dst_178
= _57;
where SLP fails to see the opportunity to use an offsetted smaller load,
probably because the group size is 9 (ptr_src[0] to ptr_src[WIDTH+1]).
And not using SLP is indeed not profitable here. With an ISA supporting
the permutation you'll see the loop vectorized (just tried -msse4.1)
with not exactly optimal handling of the loads.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations