So the first special case of clearing bits from Shreya's work. We can
clear an arbitrary number of high bits by shifting left by the number of
bits to clear, then logically shifting right to put everything in place.
Similarly we can clear an arbitrary number of low bits with a right
logical shift followed by a left shift. Naturally this only applies
when the constant synthesis budget is 2+ insns.
Even with mvconst_internal still enabled this does consistently show
various small code generation improvements.
I have seen a notable regression. The two shift form to wipe out high
bits isn't handled well by ext-dce. Essentially it looks like we don't
recognize the sequence as wiping upper bits, instead it makes bits live
and as a result we're unable to remove a prior zero extension. I've
opened a bug for this issue.
The other case I've seen is CSE related. If we had a number of masking
operations with the same mask, we might have previously CSE'd the
constant. In that scenario each instance of masking would be a single
AND using the CSE'd register holding the constant, whereas with this
patch it'll be a pair of shifts. But on a good uarch design the pair of
shifts would be fused into a single op. Given this is relatively rare
and on the margins from a performance standpoint I'm not going to worry
about it.
This has spun in my tester for riscv32-elf and riscv64-elf. Bootstrap
and regression test is in flight and due in an hour or so. Waiting on
the upstream pre-commit tester and the bootstrap test before moving forward.
jeff
gcc/
* config/riscv/riscv.cc (synthesize_and): When profitable, use two
shift combinations to clear high or low bits rather than synthsizing
the constant.
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1a88e96d8c6..5842a08fcb0 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -14524,6 +14524,43 @@ synthesize_and (rtx operands[3])
}
}
+ /* The number of instructions to synthesize the constant is a good
+ estimate of the budget. That does not account for out of order
+ execution an fusion in the constant synthesis those would naturally
+ decrease the budget. It also does not account for the AND at
+ the end of the sequence which would increase the budget. */
+ int budget = riscv_const_insns (operands[2], true);
+ rtx input = NULL_RTX;
+ rtx output = NULL_RTX;
+
+ /* Left shift + right shift to clear high bits. */
+ if (budget >= 2 && p2m1_shift_operand (operands[2], word_mode))
+ {
+ int count = (GET_MODE_BITSIZE (GET_MODE (operands[1])).to_constant ()
+ - exact_log2 (INTVAL (operands[2]) + 1));
+ rtx x = gen_rtx_ASHIFT (word_mode, operands[1], GEN_INT (count));
+ output = gen_reg_rtx (word_mode);
+ emit_insn (gen_rtx_SET (output, x));
+ input = output;
+ x = gen_rtx_LSHIFTRT (word_mode, input, GEN_INT (count));
+ emit_insn (gen_rtx_SET (operands[0], x));
+ return true;
+ }
+
+ /* Clears a bunch of low bits with only high bits set. */
+ unsigned HOST_WIDE_INT t = ~INTVAL (operands[2]);
+ if (budget >= 2 && exact_log2 (t + 1) >= 0)
+ {
+ int count = ctz_hwi (INTVAL (operands[2]));
+ rtx x = gen_rtx_LSHIFTRT (word_mode, operands[1], GEN_INT (count));
+ output = gen_reg_rtx (word_mode);
+ emit_insn (gen_rtx_SET (output, x));
+ input = output;
+ x = gen_rtx_ASHIFT (word_mode, input, GEN_INT (count));
+ emit_insn (gen_rtx_SET (operands[0], x));
+ return true;
+ }
+
/* If the remaining budget has gone to less than zero, it
forces the value into a register and performs the AND
operation. It returns TRUE to the caller so the caller