https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121773
Bug ID: 121773
Summary: Combine over-simplifies a subreg write
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rearnsha at gcc dot gnu.org
CC: segher at gcc dot gnu.org
Target Milestone: ---
Target: arm
With this testcase, compiled with -march=armv7-a+simd -mfpu=auto -marm
-mfloat-abi=hard
#include <arm_neon.h>
uint64x1_t foo() {
uint64x2_t v36 = vdupq_n_u64(0x2020000012345678);
uint64x1_t v48 = vget_low_u64(v36);
uint64x1_t v50 = vadd_u64(v48, v48);
return vpadal_u32(v50, vdup_n_u32(0));
}
Is miscompiled to
vldr.64 d16, .L2 @ int
vmov.i32 d17, #0 @ v2si
vpadal.u32 d16, d17
vmov r0, r1, d16 @ int
bx lr
.L2:
.word 0
.word 1077936128
We get, prior to combine:
(insn 21 20 7 2 (set (reg:DI 101 [ _5 ])
(const_int 0 [0]))
"/home/rearnsha/gnusrc/gcc/master/gcc/config/arm/arm_neon.h":607:14 -1
(nil))
(insn 7 21 8 2 (parallel [
(set (reg:CC_C 80 cc)
(compare:CC_C (plus:SI (reg:SI 104 [ _6 ])
(reg:SI 104 [ _6 ]))
(reg:SI 104 [ _6 ])))
(set (subreg:SI (reg:DI 101 [ _5 ]) 0)
(plus:SI (reg:SI 104 [ _6 ])
(reg:SI 104 [ _6 ])))
]) "/home/rearnsha/gnusrc/gcc/master/gcc/config/arm/arm_neon.h":607:14
17 {addsi3_compare_op1}
(expr_list:REG_DEAD (reg:SI 104 [ _6 ])
(nil)))
(insn 8 7 9 2 (set (subreg:SI (reg:DI 101 [ _5 ]) 4)
(plus:SI (plus:SI (reg:SI 105 [ _6+4 ])
(reg:SI 105 [ _6+4 ]))
(ltu:SI (reg:CC_C 80 cc)
(const_int 0 [0]))))
"/home/rearnsha/gnusrc/gcc/master/gcc/config/arm/arm_neon.h":607:14 21
{addsi3_carryin}
That is:
insn 21 clears R101
insn 7 writes the low part of R101 with an addition that carries out any
overflow bit
insn 8 writes the top part of R101 with an addition with carry-in.
In this specific test R104 and R105 are known constants. It appears that
combine tries to merge insns 21 and 8 with:
Trying 21 -> 8:
21: r101:DI=0
8: r101:DI#4=0x40400000
Successfully matched this instruction:
(set (reg:DI 101 [ _5 ])
(const_int 4629700416936869888 [0x4040000000000000]))
ie writing the whole of r101 with the top part of the addition.
somehow combine ignores that this will overwrite the intervening write of the
low part - that subsequently becomes dead code and is eliminated.