> OK. And I think this shows the basic approach we want to use if there > are other builtins that accept sub-word modes. ie, get the operands > into X mode (by extending them as appropriate), then do as much work in > X mode as possible, then truncate the result if needed.
> Thanks for your patience on this. Thanks Jeff for comments and suggestions, I will have a try if we can do some combine-like optimization for the SImode asm in RV64. Pan -----Original Message----- From: Jeff Law <jeffreya...@gmail.com> Sent: Sunday, August 18, 2024 2:17 AM To: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp....@gmail.com Subject: Re: [PATCH v4] RISC-V: Make sure high bits of usadd operands is clean for non-Xmode [PR116278] On 8/16/24 9:43 PM, pan2...@intel.com wrote: > From: Pan Li <pan2...@intel.com> > > For QI/HImode of .SAT_ADD, the operands may be sign-extended and the > high bits of Xmode may be all 1 which is not expected. For example as > below code. > > signed char b[1]; > unsigned short c; > signed char *d = b; > int main() { > b[0] = -40; > c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + > 9; > __builtin_printf("%d\n", c); > } > > After expanding we have: > > ;; _6 = .SAT_ADD (_3, 9); > (insn 8 7 9 (set (reg:DI 143) > (high:DI (symbol_ref:DI ("d") [flags 0x86] <var_decl d>))) > (nil)) > (insn 9 8 10 (set (reg/f:DI 142) > (mem/f/c:DI (lo_sum:DI (reg:DI 143) > (symbol_ref:DI ("d") [flags 0x86] <var_decl d>)) [1 d+0 S8 > A64])) > (nil)) > (insn 10 9 11 (set (reg:HI 144 [ _3 ]) > (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) > "test.c":7:10 -1 > (nil)) > > The convert from signed char to unsigned short will have sign_extend rtl > as above. And finally become the lb insn as below: > > lb a1,0(a5) // a1 is -40, aka 0xffffffffffffffd8 > lui a0,0x1a > addi a5,a1,9 > slli a5,a5,0x30 > srli a5,a5,0x30 // a5 is 65505 > sltu a1,a5,a1 // compare 65505 and 0xffffffffffffffd8 => TRUE > > The sltu try to compare 65505 and 0xffffffffffffffd8 here, but we > actually want to compare 65505 and 65496 (0xffd8). Thus we need to > clean up the high bits to ensure this. > > The below test suites are passed for this patch: > * The rv64gcv fully regression test. > > PR target/116278 > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new > func impl to zero extend rtx. > (riscv_expand_usadd): Leverage above func to cleanup operands > and sum. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/pr116278-run-1.c: New test. > * gcc.target/riscv/pr116278-run-2.c: New test. > > PR 116278 > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new > func impl to zero extend rtx. > (riscv_expand_usadd): Leverage above func to cleanup operands 0 > and remove the special handing for SImode in RV64. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_u_add-11.c: Adjust asm check body. > * gcc.target/riscv/sat_u_add-15.c: Ditto. > * gcc.target/riscv/sat_u_add-19.c: Ditto. > * gcc.target/riscv/sat_u_add-23.c: Ditto. > * gcc.target/riscv/sat_u_add-3.c: Ditto. > * gcc.target/riscv/sat_u_add-7.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-11.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-15.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-3.c: Ditto. > * gcc.target/riscv/sat_u_add_imm-7.c: Ditto. > * gcc.target/riscv/pr116278-run-1.c: New test. > * gcc.target/riscv/pr116278-run-2.c: New test. OK. And I think this shows the basic approach we want to use if there are other builtins that accept sub-word modes. ie, get the operands into X mode (by extending them as appropriate), then do as much work in X mode as possible, then truncate the result if needed. Thanks for your patience on this. Jeff