https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2019-08-28 Blocks| |53947 Summary|Vectorization failure for a |Vectorization failure for a |loop to do multiply-add |loop to do multiply-add | |because SLP loads | |unnecessarily require | |permutation Ever confirmed|0 |1 --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Huh. (compute_affine_dependence stmt_a: MEM[(char *)ptr_dst_178 + 6B] = _189; stmt_b: MEM[(char *)ptr_dst_178 + 7B] = _211; (analyze_overlapping_iterations (chrec_a = {6B, +, _279}_1) (chrec_b = {7B, +, _279}_1) (analyze_siv_subscript siv test failed: unimplemented) (overlap_iterations_a = not known) (overlap_iterations_b = not known)) ) -> dependence analysis failed Ah, I guess since _279 is unknown. Anyway, the issue for SLP is t.c:10:5: note: Load permutation 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 t.c:10:5: missed: unsupported vect permute { 1 2 3 4 5 6 7 8 17 18 19 20 21 22 23 24 } t.c:10:5: missed: Build SLP failed: unsupported load permutation *ptr_dst_178 = _57; where SLP fails to see the opportunity to use an offsetted smaller load, probably because the group size is 9 (ptr_src[0] to ptr_src[WIDTH+1]). And not using SLP is indeed not profitable here. With an ISA supporting the permutation you'll see the loop vectorized (just tried -msse4.1) with not exactly optimal handling of the loads. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations