So the first special case of clearing bits from Shreya's work. We can clear an arbitrary number of high bits by shifting left by the number of bits to clear, then logically shifting right to put everything in place. Similarly we can clear an arbitrary number of low bits with a right logical shift followed by a left shift. Naturally this only applies when the constant synthesis budget is 2+ insns.
Even with mvconst_internal still enabled this does consistently show various small code generation improvements.
I have seen a notable regression. The two shift form to wipe out high bits isn't handled well by ext-dce. Essentially it looks like we don't recognize the sequence as wiping upper bits, instead it makes bits live and as a result we're unable to remove a prior zero extension. I've opened a bug for this issue.
The other case I've seen is CSE related. If we had a number of masking operations with the same mask, we might have previously CSE'd the constant. In that scenario each instance of masking would be a single AND using the CSE'd register holding the constant, whereas with this patch it'll be a pair of shifts. But on a good uarch design the pair of shifts would be fused into a single op. Given this is relatively rare and on the margins from a performance standpoint I'm not going to worry about it.
This has spun in my tester for riscv32-elf and riscv64-elf. Bootstrap and regression test is in flight and due in an hour or so. Waiting on the upstream pre-commit tester and the bootstrap test before moving forward.
jeff
gcc/ * config/riscv/riscv.cc (synthesize_and): When profitable, use two shift combinations to clear high or low bits rather than synthsizing the constant. diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 1a88e96d8c6..5842a08fcb0 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -14524,6 +14524,43 @@ synthesize_and (rtx operands[3]) } } + /* The number of instructions to synthesize the constant is a good + estimate of the budget. That does not account for out of order + execution an fusion in the constant synthesis those would naturally + decrease the budget. It also does not account for the AND at + the end of the sequence which would increase the budget. */ + int budget = riscv_const_insns (operands[2], true); + rtx input = NULL_RTX; + rtx output = NULL_RTX; + + /* Left shift + right shift to clear high bits. */ + if (budget >= 2 && p2m1_shift_operand (operands[2], word_mode)) + { + int count = (GET_MODE_BITSIZE (GET_MODE (operands[1])).to_constant () + - exact_log2 (INTVAL (operands[2]) + 1)); + rtx x = gen_rtx_ASHIFT (word_mode, operands[1], GEN_INT (count)); + output = gen_reg_rtx (word_mode); + emit_insn (gen_rtx_SET (output, x)); + input = output; + x = gen_rtx_LSHIFTRT (word_mode, input, GEN_INT (count)); + emit_insn (gen_rtx_SET (operands[0], x)); + return true; + } + + /* Clears a bunch of low bits with only high bits set. */ + unsigned HOST_WIDE_INT t = ~INTVAL (operands[2]); + if (budget >= 2 && exact_log2 (t + 1) >= 0) + { + int count = ctz_hwi (INTVAL (operands[2])); + rtx x = gen_rtx_LSHIFTRT (word_mode, operands[1], GEN_INT (count)); + output = gen_reg_rtx (word_mode); + emit_insn (gen_rtx_SET (output, x)); + input = output; + x = gen_rtx_ASHIFT (word_mode, input, GEN_INT (count)); + emit_insn (gen_rtx_SET (operands[0], x)); + return true; + } + /* If the remaining budget has gone to less than zero, it forces the value into a register and performs the AND operation. It returns TRUE to the caller so the caller