Hi all,
In a similar rationale to patch 1/3 this patch changes the AArch64 backend to
keep the CTZ expression
as a single RTX until after reload when it is split into an RBIT and a CLZ
instruction.
This enables CTZ-specific optimisations in the pre-reload RTL optimisers.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
2016-05-26 Kyrylo Tkachov <[email protected]>
PR middle-end/37780
* config/aarch64/aarch64.md (ctz<mode>2): Convert to
define_insn_and_split.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a9e811e9f70f650fb9292b6d9a96ef4b2dbbaec6..7b3e2cd13bdcc05defda1e3ff74bf003443fe70f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3790,16 +3790,23 @@ (define_insn "rbit<mode>2"
[(set_attr "type" "rbit")]
)
-(define_expand "ctz<mode>2"
- [(match_operand:GPI 0 "register_operand")
- (match_operand:GPI 1 "register_operand")]
+;; Split after reload into RBIT + CLZ. Since RBIT is represented as an UNSPEC
+;; it is unlikely to fold with any other operation, so keep this as a CTZ
+;; expression and split after reload to enable scheduling them apart if
+;; needed.
+
+(define_insn_and_split "ctz<mode>2"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (ctz:GPI (match_operand:GPI 1 "register_operand" "r")))]
""
- {
- emit_insn (gen_rbit<mode>2 (operands[0], operands[1]));
- emit_insn (gen_clz<mode>2 (operands[0], operands[0]));
- DONE;
- }
-)
+ "#"
+ "reload_completed"
+ [(const_int 0)]
+ "
+ emit_insn (gen_rbit<mode>2 (operands[0], operands[1]));
+ emit_insn (gen_clz<mode>2 (operands[0], operands[0]));
+ DONE;
+")
(define_insn "*and<mode>_compare0"
[(set (reg:CC_NZ CC_REGNUM)