Hi! The following testcase is miscompiled on riscv since the addition of *mvconst_internal define_insn_and_split. I believe the bug is in DSE. We have: (insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame) (const_int -64 [0xffffffffffffffc0])) [2 S4 A128]) (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 166) (nil))) (insn 39 36 40 2 (set (reg:SI 171) (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame) (const_int -64 [0xffffffffffffffc0])) [0 S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2} (nil)) and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles even different modes like in the above case, and so it optimizes it into: (insn 47 35 39 2 (set (reg:HI 175) (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal} (expr_list:REG_DEAD (reg:SI 166) (nil))) (insn 39 47 40 2 (set (reg:SI 171) (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2} (expr_list:REG_DEAD (reg:HI 175) (nil))) Pseudo 166 is result of AND with 0x8084c constant (forced into a register). Combine attempts to combine the AND with the insn 47 above created by DSE, and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all the subword operations are actually done on word mode into: (set (subreg:SI (reg:HI 175) 0) (and:SI (reg:SI 167 [ m ]) (reg:SI 168))) and later on the ZERO_EXTEND is thrown away.
The following patch changes DSE to instead emit (insn 47 35 39 2 (set (reg:SI 175) (reg:SI 166)) "pr109040.c":10:11 180 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 166) (nil))) (insn 39 47 40 2 (set (reg:SI 171) (zero_extend:SI (subreg:HI (reg:SI 175) 0))) "pr109040.c":10:11 111 {*zero_extendhisi2} (expr_list:REG_DEAD (reg:SI 175) (nil))) i.e. in the new insn copy whole reg rather than subword part of it where we don't really know anything about the upper bits. With this change, combine manages to do the right thing, optimies the (unsigned) (unsigned short) (reg & 0x8084c) into reg & 0x84c. Bootstrapped/regtested on x86_64-linux and i686-linux (admittedly not WORD_REGISTER_OPERATIONS targets) and tested using a cross to riscv on the testcase. I have unfortunately no way to bootstrap this on risc*-linux, could somebody do that? Ok for trunk if that passes/ 2023-04-05 Jakub Jelinek <ja...@redhat.com> PR target/109040 * dse.cc (replace_read): If read_reg is a SUBREG of a word mode REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into a new REG rather than the SUBREG. * gcc.c-torture/execute/pr109040.c: New test. --- gcc/dse.cc.jj 2023-01-02 09:32:50.369880943 +0100 +++ gcc/dse.cc 2023-04-04 22:17:22.906347794 +0200 @@ -2012,7 +2012,19 @@ replace_read (store_info *store_info, in } /* Force the value into a new register so that it won't be clobbered between the store and the load. */ - read_reg = copy_to_mode_reg (read_mode, read_reg); + if (WORD_REGISTER_OPERATIONS + && GET_CODE (read_reg) == SUBREG + && REG_P (SUBREG_REG (read_reg)) + && GET_MODE (SUBREG_REG (read_reg)) == word_mode) + { + /* For WORD_REGISTER_OPERATIONS with subreg of word_mode register + force SUBREG_REG into a new register rather than the SUBREG. */ + rtx r = copy_to_mode_reg (word_mode, SUBREG_REG (read_reg)); + read_reg = shallow_copy_rtx (read_reg); + SUBREG_REG (read_reg) = r; + } + else + read_reg = copy_to_mode_reg (read_mode, read_reg); insns = get_insns (); end_sequence (); --- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj 2023-04-04 16:57:31.739658564 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr109040.c 2023-04-04 16:57:15.238900638 +0200 @@ -0,0 +1,23 @@ +/* PR target/109040 */ + +typedef unsigned short __attribute__((__vector_size__ (32))) V; + +unsigned short a, b, c, d; + +void +foo (V m, unsigned short *ret) +{ + V v = 6 > ((V) { 2124, 8 } & m); + unsigned short uc = v[0] + a + b + c + d; + *ret = uc; +} + +int +main () +{ + unsigned short x; + foo ((V) { 0, 15 }, &x); + if (x != (unsigned short) ~0) + __builtin_abort (); + return 0; +} Jakub