Hello!
I would like to bring a strange optimization problem to the attention of
RTL expert. The problem is outlined in PR rtl-optimization/33353, the
core of the problem is that passes that follow RTL fwprop1 pass simply
don't process REG_EQUAL notes that mark constant result.
For the testcase in PR 33353, following sequence can be found in
_.137r.fwprop1 dump, just before the loop:
--cut here--
(insn 11 10 12 3 t.c:6 (set (reg:V4SI 64 [ vect_cst_.15 ])
(mem/u/c/i:V4SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [3 S16
A128])) 9
60 {*movv4si_internal} (expr_list:REG_EQUAL (const_vector:V4SI [
(const_int 0 [0x0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
])
(nil)))
(insn 12 11 13 3 t.c:6 (parallel [
(set (reg/f:DI 63 [ vect_ptabs.25 ])
(plus:DI (reg/f:DI 20 frame)
(const_int -32 [0xffffffffffffffe0])))
(clobber (reg:CC 17 flags))
]) 230 {*adddi_1_rex64} (nil))
(insn 13 12 14 3 t.c:13 (set (reg:V4SI 65)
(mem/u/c/i:V4SI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [3 S16
A128])) 9
60 {*movv4si_internal} (expr_list:REG_EQUAL (const_vector:V4SI [
(const_int 2 [0x2])
(const_int 2 [0x2])
(const_int 2 [0x2])
(const_int 2 [0x2])
])
(nil)))
(insn 14 13 15 3 t.c:13 (set (reg:V4SI 66)
(mult:V4SI (reg:V4SI 64 [ vect_cst_.15 ])
(reg:V4SI 65))) 1137 {*sse2_mulv4si3} (expr_list:REG_EQUAL
(const_ve
ctor:V4SI [
(const_int 0 [0x0])
(const_int 2 [0x2])
(const_int 4 [0x4])
(const_int 6 [0x6])
])
(nil)))
(insn 15 14 16 3 t.c:13 (set (mem:V4SI (reg/f:DI 63 [ vect_ptabs.25 ])
[3 S16 A1
28])
(reg:V4SI 66)) 960 {*movv4si_internal} (nil))
(insn 16 15 17 3 t.c:13 (set (reg:V4SI 67)
(mem/u/c/i:V4SI (symbol_ref/u:DI ("*.LC2") [flags 0x2]) [3 S16
A128])) 9
60 {*movv4si_internal} (expr_list:REG_EQUAL (const_vector:V4SI [
(const_int 4 [0x4])
(const_int 4 [0x4])
(const_int 4 [0x4])
(const_int 4 [0x4])
])
(nil)))
(insn 17 16 19 3 t.c:13 (set (reg:V4SI 68)
(plus:V4SI (reg:V4SI 64 [ vect_cst_.15 ])
(reg:V4SI 67))) 1115 {*addv4si3} (expr_list:REG_EQUAL
(const_vector:
V4SI [
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
])
(nil)))
(insn 19 17 20 3 t.c:13 (set (reg:V4SI 70)
(mult:V4SI (reg:V4SI 68)
(reg:V4SI 65))) 1137 {*sse2_mulv4si3} (expr_list:REG_EQUAL
(const_ve
ctor:V4SI [
(const_int 8 [0x8])
(const_int 10 [0xa])
(const_int 12 [0xc])
(const_int 14 [0xe])
])
(nil)))
(insn 20 19 22 3 t.c:13 (set (mem:V4SI (plus:DI (reg/f:DI 63 [
vect_ptabs.25 ])
(const_int 16 [0x10])) [3 S16 A128])
(reg:V4SI 70)) 960 {*movv4si_internal} (nil))
--cut here--
As can be seen from above sequence, every relevant insn gets REG_EQUAL
attached, as the result is indeed constant. (insn 15) and (insn 20) then
push the result to the stack before calling function "g".
However, it looks that GCC doesn't know what to do with the constants.
One of following passes (which?) should add calculated constant into the
constant pool and change (insn 15) and (insn 20) to directly load
constants into (reg 66) and (reg 70), without calculating it.
Fortunately, loop optimization pass detects that these insns are loop
invariant and moves them out of loop, resulting in:
.LCFI2:
movdqa .LC0(%rip), %xmm1
leaq 16(%rsp), %rbp
movdqa .LC1(%rip), %xmm0
paddd .LC2(%rip), %xmm1
pmulld %xmm1, %xmm0
movdqa %xmm0, (%rsp)
.L2:
The question that would shine some light on this issue is, which pass
should handle REG_EQUAL notes to substitute calculation with a load from
constant pool?
Uros.