On 4/5/23 03:16, Jakub Jelinek wrote:
Hi!
The following testcase is miscompiled on riscv since the addition
of *mvconst_internal define_insn_and_split.
I believe the bug is in DSE. We have:
(insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
(const_int -64 [0xffffffffffffffc0])) [2 S4 A128])
(reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 166)
(nil)))
(insn 39 36 40 2 (set (reg:SI 171)
(zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
(const_int -64 [0xffffffffffffffc0])) [0 S2 A128])))
"pr109040.c":9:11 111 {*zero_extendhisi2}
(nil))
and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
even different modes like in the above case, and so it optimizes it into:
(insn 47 35 39 2 (set (reg:HI 175)
(subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
(expr_list:REG_DEAD (reg:SI 166)
(nil)))
(insn 39 47 40 2 (set (reg:SI 171)
(zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111
{*zero_extendhisi2}
(expr_list:REG_DEAD (reg:HI 175)
(nil)))
Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
Right. But do we agree that the two above are equivalent? If they are
then changing DSE just papers over the combine issue downstream.
Combine attempts to combine the AND with the insn 47 above created by DSE,
and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all
the subword operations are actually done on word mode into:
(set (subreg:SI (reg:HI 175) 0)
(and:SI (reg:SI 167 [ m ])
(reg:SI 168)))
and later on the ZERO_EXTEND is thrown away.
And isn't that where the bug really is. There's a patch from pan2.li in
this exact space. That patch still doesn't look right to me, but it
seems to be attacking this problem closer to the right place.
The following patch changes DSE to instead emit
(insn 47 35 39 2 (set (reg:SI 175)
(reg:SI 166)) "pr109040.c":10:11 180 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 166)
(nil)))
(insn 39 47 40 2 (set (reg:SI 171)
(zero_extend:SI (subreg:HI (reg:SI 175) 0))) "pr109040.c":10:11 111
{*zero_extendhisi2}
(expr_list:REG_DEAD (reg:SI 175)
(nil)))
i.e. in the new insn copy whole reg rather than subword part of it where
we don't really know anything about the upper bits.
With this change, combine manages to do the right thing, optimies the
(unsigned) (unsigned short) (reg & 0x8084c) into
reg & 0x84c.
This may still be desirable but it feels like papering over the problem
to me.
Bootstrapped/regtested on x86_64-linux and i686-linux (admittedly not
WORD_REGISTER_OPERATIONS targets) and tested using a cross to riscv
on the testcase. I have unfortunately no way to bootstrap this on
risc*-linux, could somebody do that? Ok for trunk if that passes/
2023-04-05 Jakub Jelinek <ja...@redhat.com>
PR target/109040
* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
a new REG rather than the SUBREG.
* gcc.c-torture/execute/pr109040.c: New test.
No problem with the patch itself. I just want to make sure we're fixing
the problem in the right place.
jeff