add)

jakub at gcc dot gnu.org Thu, 08 Mar 2018 08:07:09 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with my r204516 change aka
https://gcc.gnu.org/ml/gcc-patches/2013-11/msg00768.html
The changes in *.optimized dump look reasonable, fewer IVs:
   <bb 3>:
+  # RANGE [16, 4294967280]
+  _18 = _22 * 16;
+  # RANGE [0, 4294967295]
+  _9 = _18 + _6;
+  _3 = (void *) _9;

   <bb 4>:
   # PT = nonlocal 
   # ALIGN = 16, MISALIGN = 0
   # addr_23 = PHI <addr_15(5), addr_7(3)>
-  # RANGE [0, 4294967295] NONZERO 0x0000000000fffffff
-  # i_24 = PHI <i_14(5), 0(3)>
   __asm__ __volatile__("dcbf 0, %0" :  : "r" addr_23 : "memory");
-  # RANGE [0, 4294967295] NONZERO 0x0000000000fffffff
-  i_14 = i_24 + 1;
   # PT = nonlocal 
   # ALIGN = 16, MISALIGN = 0
   addr_15 = addr_23 + 16;
-  if (i_14 != _22)
+  if (addr_15 != _3)

The * 16 is present in GIMPLE, but the right shift by 4 / division by 16 is
something added in the doloop pass later on, so this doesn't seem to be
something that can be optimized in GIMPLE.
On the trunk, we have in *.optimized:
  # RANGE [0, 268435455] NONZERO 268435455
  _20 = size_12 >> 4;
  if (_20 != 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 6>; [11.00%]

  <bb 3> [local count: 105119325]:
  # RANGE [16, 4294967280]
  _6 = _20 * 16;
  _4 = _1 + _6;
  _3 = (void *) _4;

  <bb 4> [local count: 955630224]:
  # PT = nonlocal null
  # ALIGN = 16, MISALIGN = 0
  # addr_21 = PHI <addr_10(3), addr_15(4)>
  __asm__ __volatile__("dcbf 0, %0" :  : "r" addr_21 : "memory");
  # PT = nonlocal null
  # ALIGN = 16, MISALIGN = 0
  addr_15 = addr_21 + 16;
  if (_3 != addr_15)
    goto <bb 4>; [89.00%]
  else
    goto <bb 5>; [11.00%]
To optimize this on RTL we'd need to have accurate value range info, otherwise
optimizing (unsigned) (x << 4 + (cst << 4)) >> 4 into x + cst is not valid.
Before *.combine we have:
(insn 16 15 17 3 (set (reg:SI 135)
        (ashift:SI (reg:SI 128 [ _20 ])
            (const_int 4 [0x4]))) 269 {ashlsi3}
     (expr_list:REG_DEAD (reg:SI 128 [ _20 ])
        (nil)))
(insn 17 16 42 3 (set (reg/f:SI 122 [ _3 ])
        (plus:SI (reg:SI 135)
            (reg/v/f:SI 127 [ addr ]))) 72 {*addsi3}
     (expr_list:REG_DEAD (reg:SI 135)
        (nil)))
(insn 42 17 43 3 (set (reg:SI 138)
        (minus:SI (reg/f:SI 122 [ _3 ])
            (reg/v/f:SI 127 [ addr ]))) -1
     (expr_list:REG_DEAD (reg/f:SI 122 [ _3 ])
        (nil)))
(insn 43 42 44 3 (set (reg:SI 139)
        (plus:SI (reg:SI 138)
            (const_int -16 [0xfffffffffffffff0]))) -1
     (expr_list:REG_DEAD (reg:SI 138)
        (nil)))
(insn 44 43 45 3 (set (reg:SI 140)
        (lshiftrt:SI (reg:SI 139)
            (const_int 4 [0x4]))) -1
     (expr_list:REG_DEAD (reg:SI 139)
        (nil)))
and only combine turns that into:
(insn 42 17 43 3 (set (reg:SI 138)
        (ashift:SI (reg:SI 128 [ _20 ])
            (const_int 4 [0x4]))) 269 {ashlsi3}
     (expr_list:REG_DEAD (reg:SI 128 [ _20 ])
        (nil)))
(insn 43 42 44 3 (set (reg:SI 139)
        (plus:SI (reg:SI 138)
            (const_int -16 [0xfffffffffffffff0]))) 72 {*addsi3}
     (expr_list:REG_DEAD (reg:SI 138)
        (nil)))
(insn 44 43 45 3 (set (reg:SI 140)
        (lshiftrt:SI (reg:SI 139)
            (const_int 4 [0x4]))) 279 {lshrsi3}
     (expr_list:REG_DEAD (reg:SI 139)
        (nil)))
(insn 45 44 21 3 (set (reg:SI 137)
        (plus:SI (reg:SI 140)
            (const_int 1 [0x1]))) 72 {*addsi3}
     (expr_list:REG_DEAD (reg:SI 140)
        (nil)))

While reg:SI 128 is set only once and thus in theory we could in theory use in
RTL the corresponding GIMPLE value ranges for that SSA_NAME, the SSA_NAME for
_20 is [0, 268435455] and thus only allows us to figure out that << 4 will not
shift away any bits out of it (i.e. that (r128 << 4) >> 4 is equal to r128.  We
need to know that it can't be zero as well, which on GIMPLE is present in the
value range of _6 - [16, 4294967280], but unfortunately that info is lost
during TER, pseudo 135 doesn't really have REG_EXPR set.

Maybe that would be fixable by some expander work.

Then the question is if we actually can use GIMPLE VRP info during RTL
optimizations, and whether all optimizations on RTL that would invalidate that
reset REG_EXPR or could in some other way signal that the VRP info can't be
trusted.

The combiner first optimizes:
Trying 17 -> 42:
   17: r122:SI=r135:SI+r127:SI
      REG_DEAD r135:SI
   42: r138:SI=r122:SI-r127:SI
      REG_DEAD r122:SI
Successfully matched this instruction:
(set (reg:SI 138)
    (reg:SI 135))

and then:
Trying 16 -> 42:
   16: r135:SI=r128:SI<<0x4
      REG_DEAD r128:SI
   42: r138:SI=r135:SI
      REG_DEAD r135:SI
Successfully matched this instruction:
(set (reg:SI 138)
    (ashift:SI (reg:SI 128 [ _20 ])
        (const_int 4 [0x4])))
so if we had VRP info on _6 aka (reg:SI 135 [ _6 ]) we'd need to signal that it
is the same on r138.
And then have some way to query it and somewhere optimize.

[Bug target/67288] [6/7/8 regression] non optimal simple function (useless additional shift/remove/shift/add)

Reply via email to