Hello!

I would like to bring a strange optimization problem to the attention of RTL expert. The problem is outlined in PR rtl-optimization/33353, the core of the problem is that passes that follow RTL fwprop1 pass simply don't process REG_EQUAL notes that mark constant result.

For the testcase in PR 33353, following sequence can be found in _.137r.fwprop1 dump, just before the loop:

--cut here--
(insn 11 10 12 3 t.c:6 (set (reg:V4SI 64 [ vect_cst_.15 ])
(mem/u/c/i:V4SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [3 S16 A128])) 9
60 {*movv4si_internal} (expr_list:REG_EQUAL (const_vector:V4SI [
               (const_int 0 [0x0])
               (const_int 1 [0x1])
               (const_int 2 [0x2])
               (const_int 3 [0x3])
           ])
       (nil)))

(insn 12 11 13 3 t.c:6 (parallel [
           (set (reg/f:DI 63 [ vect_ptabs.25 ])
               (plus:DI (reg/f:DI 20 frame)
                   (const_int -32 [0xffffffffffffffe0])))
           (clobber (reg:CC 17 flags))
       ]) 230 {*adddi_1_rex64} (nil))

(insn 13 12 14 3 t.c:13 (set (reg:V4SI 65)
(mem/u/c/i:V4SI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [3 S16 A128])) 9
60 {*movv4si_internal} (expr_list:REG_EQUAL (const_vector:V4SI [
               (const_int 2 [0x2])
               (const_int 2 [0x2])
               (const_int 2 [0x2])
               (const_int 2 [0x2])
           ])
       (nil)))

(insn 14 13 15 3 t.c:13 (set (reg:V4SI 66)
       (mult:V4SI (reg:V4SI 64 [ vect_cst_.15 ])
(reg:V4SI 65))) 1137 {*sse2_mulv4si3} (expr_list:REG_EQUAL (const_ve
ctor:V4SI [
               (const_int 0 [0x0])
               (const_int 2 [0x2])
               (const_int 4 [0x4])
               (const_int 6 [0x6])
           ])
       (nil)))

(insn 15 14 16 3 t.c:13 (set (mem:V4SI (reg/f:DI 63 [ vect_ptabs.25 ]) [3 S16 A1
28])
       (reg:V4SI 66)) 960 {*movv4si_internal} (nil))

(insn 16 15 17 3 t.c:13 (set (reg:V4SI 67)
(mem/u/c/i:V4SI (symbol_ref/u:DI ("*.LC2") [flags 0x2]) [3 S16 A128])) 9
60 {*movv4si_internal} (expr_list:REG_EQUAL (const_vector:V4SI [
               (const_int 4 [0x4])
               (const_int 4 [0x4])
               (const_int 4 [0x4])
               (const_int 4 [0x4])
           ])
       (nil)))

(insn 17 16 19 3 t.c:13 (set (reg:V4SI 68)
       (plus:V4SI (reg:V4SI 64 [ vect_cst_.15 ])
(reg:V4SI 67))) 1115 {*addv4si3} (expr_list:REG_EQUAL (const_vector:
V4SI [
               (const_int 4 [0x4])
               (const_int 5 [0x5])
               (const_int 6 [0x6])
               (const_int 7 [0x7])
           ])
       (nil)))

(insn 19 17 20 3 t.c:13 (set (reg:V4SI 70)
       (mult:V4SI (reg:V4SI 68)
(reg:V4SI 65))) 1137 {*sse2_mulv4si3} (expr_list:REG_EQUAL (const_ve
ctor:V4SI [
               (const_int 8 [0x8])
               (const_int 10 [0xa])
               (const_int 12 [0xc])
               (const_int 14 [0xe])
           ])
       (nil)))

(insn 20 19 22 3 t.c:13 (set (mem:V4SI (plus:DI (reg/f:DI 63 [ vect_ptabs.25 ])
               (const_int 16 [0x10])) [3 S16 A128])
       (reg:V4SI 70)) 960 {*movv4si_internal} (nil))
--cut here--

As can be seen from above sequence, every relevant insn gets REG_EQUAL attached, as the result is indeed constant. (insn 15) and (insn 20) then push the result to the stack before calling function "g".

However, it looks that GCC doesn't know what to do with the constants. One of following passes (which?) should add calculated constant into the constant pool and change (insn 15) and (insn 20) to directly load constants into (reg 66) and (reg 70), without calculating it. Fortunately, loop optimization pass detects that these insns are loop invariant and moves them out of loop, resulting in:

.LCFI2:
       movdqa  .LC0(%rip), %xmm1
       leaq    16(%rsp), %rbp
       movdqa  .LC1(%rip), %xmm0
       paddd   .LC2(%rip), %xmm1
       pmulld  %xmm1, %xmm0
       movdqa  %xmm0, (%rsp)
.L2:

The question that would shine some light on this issue is, which pass should handle REG_EQUAL notes to substitute calculation with a load from constant pool?

Uros.

Reply via email to