[Bug tree-optimization/91573] Vectorization failure for a loop to do multiply-add because SLP loads unnecessarily require permutation

rguenth at gcc dot gnu.org Wed, 28 Aug 2019 01:36:42 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91573


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-08-28
             Blocks|                            |53947
            Summary|Vectorization failure for a |Vectorization failure for a
                   |loop to do multiply-add     |loop to do multiply-add
                   |                            |because SLP loads
                   |                            |unnecessarily require
                   |                            |permutation
     Ever confirmed|0                           |1

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Huh.

(compute_affine_dependence
  stmt_a: MEM[(char *)ptr_dst_178 + 6B] = _189;
  stmt_b: MEM[(char *)ptr_dst_178 + 7B] = _211;
(analyze_overlapping_iterations
  (chrec_a = {6B, +, _279}_1)
  (chrec_b = {7B, +, _279}_1)
(analyze_siv_subscript
  siv test failed: unimplemented)
  (overlap_iterations_a = not known)
  (overlap_iterations_b = not known))
) -> dependence analysis failed

Ah, I guess since _279 is unknown.

Anyway, the issue for SLP is

t.c:10:5: note:   Load permutation 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8
t.c:10:5: missed:   unsupported vect permute { 1 2 3 4 5 6 7 8 17 18 19 20 21
22 23 24 }
t.c:10:5: missed:   Build SLP failed: unsupported load permutation *ptr_dst_178
= _57;

where SLP fails to see the opportunity to use an offsetted smaller load,
probably because the group size is 9 (ptr_src[0] to ptr_src[WIDTH+1]).
And not using SLP is indeed not profitable here.  With an ISA supporting
the permutation you'll see the loop vectorized (just tried -msse4.1)
with not exactly optimal handling of the loads.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/91573] Vectorization failure for a loop to do multiply-add because SLP loads unnecessarily require permutation

Reply via email to