[Bug tree-optimization/99414] s235 benchmark of TSVC is vectorized better by icc than gcc (loop interchange)

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 08 Mar 2021 00:39:02 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amker at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-03-08
     Ever confirmed|0                           |1
          Component|middle-end                  |tree-optimization
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
linterchange says:

Consider loop interchange for loop_nest<1 - 3>
Access Strides for DRs:
  a[i_33]:              <0,     4,      0>
  b[i_33]:              <0,     4,      0>
  c[i_33]:              <0,     4,      0>
  a[i_33]:              <0,     4,      0>
  aa[_6][i_33]:         <0,     4,      1024>
  bb[j_34][i_33]:               <0,     4,      1024>
  aa[j_34][i_33]:               <0,     4,      1024>

Loop(3) carried vars:
  Induction:  j_34 = {1, 1}_3
  Induction:  ivtmp_53 = {255, 4294967295}_3

Loop(2) carried vars:
  Induction:  i_33 = {0, 1}_2
  Induction:  ivtmp_51 = {256, 4294967295}_2

and then doesn't do anything.

I suppose the best thing to do here is to first distribute the loop nest,
but our cost modeling fuses the two obvious candidates:

Fuse partitions because they have shared memory refs:
  Part 1: 0, 1, 2, 3, 4, 5, 6, 7, 19, 20, 21
  Part 2: 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21

so this is a case that asks for better cost modeling there.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/99414] s235 benchmark of TSVC is vectorized better by icc than gcc (loop interchange)

Reply via email to