[Bug tree-optimization/115120] New: Bad interaction between ivcanon and early break vectorization

acoplan at gcc dot gnu.org via Gcc-bugs Thu, 16 May 2024 09:03:18 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120


            Bug ID: 115120
           Summary: Bad interaction between ivcanon and early break
                    vectorization
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

Consider the following testcase on aarch64:

int arr[1024];
int *f()
{
    int i;
    for (i = 0; i < 1024; i++)
      if (arr[i] == 42)
        break;
    return arr + i;
}

compiled with -O3 we get the following vector loop body:

.L2:
        cmp     x2, x1
        beq     .L9
.L6:
        ldr     q31, [x1]
        add     x1, x1, 16
        mov     v27.16b, v29.16b
        mov     v28.16b, v30.16b
        cmeq    v31.4s, v31.4s, v26.4s
        add     v29.4s, v29.4s, v24.4s
        add     v30.4s, v30.4s, v25.4s
        umaxp   v31.4s, v31.4s, v31.4s
        fmov    x3, d31
        cbz     x3, .L2

it's somewhat surprising that there are two vector adds, looking at the
optimized dump:

<bb 3> [local count: 1063004408]:
  # vect_vec_iv_.6_28 = PHI <_29(10), { 0, 1, 2, 3 }(2)>
  # vect_vec_iv_.7_33 = PHI <_34(10), { 1024, 1023, 1022, 1021 }(2)>
  # ivtmp.18_19 = PHI <ivtmp.18_20(10), ivtmp.18_26(2)>
  _34 = vect_vec_iv_.7_33 + { 4294967292, 4294967292, 4294967292, 4294967292 };
  _29 = vect_vec_iv_.6_28 + { 4, 4, 4, 4 };
  _25 = (void *) ivtmp.18_19;
  vect__1.10_39 = MEM <vector(4) int> [(int *)_25];
  mask_patt_9.11_41 = vect__1.10_39 == { 42, 42, 42, 42 };
  if (mask_patt_9.11_41 != { 0, 0, 0, 0 })
    goto <bb 4>; [5.50%]
  else
    goto <bb 10>; [94.50%]

we can see that there are two IV updates that got vectorized.  It turns out
that
one of these comes from the ivcanon pass.  If I add -fno-tree-loop-ivcanon we
instead get the following vector loop body:

.L2:
        cmp     x2, x1
        beq     .L9
.L6:
        ldr     q31, [x1]
        add     x1, x1, 16
        mov     v29.16b, v30.16b
        add     v30.4s, v30.4s, v27.4s
        cmeq    v31.4s, v31.4s, v28.4s
        umaxp   v31.4s, v31.4s, v31.4s
        fmov    x3, d31
        cbz     x3, .L2

which is much cleaner.  Looking at the tree dumps, the ivcanon pass makes the
following transformation:

--- cddce2.tree 2024-05-16 13:49:10.426703350 +0000
+++ ivcanon.tree        2024-05-16 13:49:17.678874925 +0000
@@ -4,6 +4,8 @@
   int i;
   int _1;
   int * _8;
+  unsigned int ivtmp_11;
+  unsigned int ivtmp_12;
   long unsigned int _13;
   long unsigned int _15;
   long unsigned int prephitmp_16;
@@ -12,6 +14,7 @@

   <bb 3> [local count: 1063004408]:
   # i_10 = PHI <i_7(7), 0(2)>
+  # ivtmp_12 = PHI <ivtmp_11(7), 1024(2)>
   _1 = arr[i_10];
   if (_1 == 42)
     goto <bb 5>; [5.50%]
@@ -20,7 +23,8 @@

   <bb 4> [local count: 1004539166]:
   i_7 = i_10 + 1;
-  if (i_7 != 1024)
+  ivtmp_11 = ivtmp_12 - 1;
+  if (ivtmp_11 != 0)
     goto <bb 7>; [98.93%]
   else
     goto <bb 8>; [1.07%]

i.e. it introduces the backwards-counting IV.  It seems in the general case
without vectorization ivopts then cleans this up and ensures we only have a
single IV.

In the vectorized case it seems this problem only shows up with early break
vectorization. Looking at a simple reduction, such as:

int a[1024];
int g()
{
    int sum = 0;
    for (int i = 0; i < 1024; i++)
        sum += a[i];
    return sum;
}

although we still have the backwards-counting IV in ifcvt:

  <bb 3> [local count: 1063004408]:
  # sum_9 = PHI <sum_5(5), 0(2)>
  # i_11 = PHI <i_6(5), 0(2)>
  # ivtmp_8 = PHI <ivtmp_7(5), 1024(2)>
  _1 = a[i_11];
  sum_5 = _1 + sum_9;
  i_6 = i_11 + 1;
  ivtmp_7 = ivtmp_8 - 1;
  if (ivtmp_7 != 0)
    goto <bb 5>; [98.99%]
  else
    goto <bb 4>; [1.01%]

we end up with only scalar IVs after vectorization, and the backwards scalar IV
ends up getting deleted by dce6:

Deleting : ivtmp_7 = ivtmp_8 - 1;

I'm not sure what the right solution is but we should avoid having duplicated
IVs with early break vectorization.

[Bug tree-optimization/115120] New: Bad interaction between ivcanon and early break vectorization

Reply via email to