Karl Meakin <karl.mea...@arm.com> writes: > Add rules for lowering `cbranch<mode>4` to CBB/CBH/CB when CMPBR > extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch<mode>4): emit CMPBR > instructions if possible. > (cbranch<SHORT:mode>4): new expand rule. > (aarch64_cb<GPI:mode>): likewise. > (aarch64_cb<SHORT:mode>): likewise. > * config/aarch64/iterators.md (cmpbr_suffix): new mode attr. > * config/aarch64/predicates.md (const_0_to_63_operand): new > predicate. > (aarch64_cb_immediate): likewise. > (aarch64_cb_operand): likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/cmpbr.c: update tests.
In addition to Kyrill's comments (which I agree with): > @@ -720,18 +720,41 @@ (define_constants > ;; Conditional jumps > ;; ------------------------------------------------------------------- > > -(define_expand "cbranch<mode>4" > +(define_expand "cbranch<GPI:mode>4" > [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" > [(match_operand:GPI 1 "register_operand") > (match_operand:GPI 2 "aarch64_plus_operand")]) > (label_ref (match_operand 3)) > (pc)))] > "" > - " > - operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1], > - operands[2]); > - operands[2] = const0_rtx; > - " > + { > + if (TARGET_CMPBR && aarch64_cb_operand (operands[2], <MODE>mode)) > + { > + emit_jump_insn (gen_aarch64_cb<mode> (operands[0], operands[1], > + operands[2], operands[3])); > + DONE; > + } There is an implicit choice here to use a separate CMP + Bcc if the immediate is out of range, rather than force out-of-range immediates into a temporary register. That can be the right choice for immediates in the range of CMP, but whether it is or not depends on global information that we don't have. If the immediate is needed for multiple branches, it would be better (sizewise) to load the immediate into a temporary register and use it for each branch, provided that there's a call-clobbered register free and that the branches are in the 1KiB range. In other situations, what the patch does is best. But perhaps it would be worth forcing values that are outside the range of CMP into a register and using the new form, rather than emitting an immediate move, a CMP, and a branch. Either way, I think it's worth a comment saying what we do with out-of-range immediates. > + else > + { > + operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), > + operands[1], operands[2]); > + operands[2] = const0_rtx; > + } > + } > +) > + > @@ -758,6 +781,58 @@ (define_expand "cbranchcc4" > "" > ) > > +;; Emit a `CB<cond> (register)` or `CB<cond> (immediate)` instruction. > +(define_insn "aarch64_cb<GPI:mode>" > + [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" > + [(match_operand:GPI 1 "register_operand") > + (match_operand:GPI 2 "aarch64_cb_operand")]) > + (label_ref (match_operand 3)) > + (pc)))] > + "TARGET_CMPBR" > + "cb%m0\\t%<w>1, %<w>2, %l3"; > + [(set_attr "type" "branch") > + (set (attr "length") > + (if_then_else (and (ge (minus (match_dup 3) (pc)) > + (const_int BRANCH_LEN_N_1Kib)) > + (lt (minus (match_dup 3) (pc)) > + (const_int BRANCH_LEN_P_1Kib))) > + (const_int 4) > + (const_int 8))) > + (set (attr "far_branch") > + (if_then_else (and (ge (minus (match_dup 3) (pc)) > + (const_int BRANCH_LEN_N_1Kib)) > + (lt (minus (match_dup 3) (pc)) > + (const_int BRANCH_LEN_P_1Kib))) > + (const_string "no") > + (const_string "yes")))] > +) > + > +;; Emit a `CBB<cond> (register)` or `CBH<cond> (register)` instruction. > +(define_insn "aarch64_cb<SHORT:mode>" > + [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" > + [(match_operand:SHORT 1 "register_operand") > + (match_operand:SHORT 2 > "aarch64_cb_short_operand")]) > + (label_ref (match_operand 3)) > + (pc)))] > + "TARGET_CMPBR" > + "cb<SHORT:cmpbr_suffix>%m0\\t%<w>1, %<w>2, %l3"; > + [(set_attr "type" "branch") > + (set (attr "length") > + (if_then_else (and (ge (minus (match_dup 3) (pc)) > + (const_int BRANCH_LEN_N_1Kib)) > + (lt (minus (match_dup 3) (pc)) > + (const_int BRANCH_LEN_P_1Kib))) > + (const_int 4) > + (const_int 8))) > + (set (attr "far_branch") > + (if_then_else (and (ge (minus (match_dup 3) (pc)) > + (const_int BRANCH_LEN_N_1Kib)) > + (lt (minus (match_dup 3) (pc)) > + (const_int BRANCH_LEN_P_1Kib))) > + (const_string "no") > + (const_string "yes")))] > +) > + The patch defines cmpbr_suffix to handle :GPI as well as :SHORT. It looks like the main remaining difference between these two patterns is the predicate (and as Kyrill says, the constraint) on operand 2. It would be possible to handle that difference using mode attributes too, such as: "aarch64_<cb_operand>" "r<cb_imm>" with cb_operand defined to cb_short_operand for :SHORT and cb_operand for :GPI. Similarly <cb_imm> would map to Z for :SHORT and a new constraint for :GPI. I don't know that that's better, just though I'd mention it in case. A slight wrinkle is that the CB<cc> immediate instruction requires CBLT rather than CBLE, etc. IIRC, GCC canonicalises in the opposite direction, preferring LEU over LTU, etc. So I think we might need a custom version of aarch64_comparison_operator that checks whether the immediate is in the range [0, 63] for the "native" cmoparisons and an appropriate variant for the "non-native" comparisons (LE, GE, LEU, GEU). The output asm section would then need to adjust the instruction accordingly before printing it out. All of which suggests it would be better to keep the SHORT and GPI patterns separate after all :) > diff --git a/gcc/config/aarch64/predicates.md > b/gcc/config/aarch64/predicates.md > index 1ab1c696c62..0fc4dabba2a 100644 > --- a/gcc/config/aarch64/predicates.md > +++ b/gcc/config/aarch64/predicates.md > @@ -50,6 +50,10 @@ (define_predicate "const_0_to_7_operand" > (and (match_code "const_int") > (match_test "IN_RANGE (INTVAL (op), 0, 7)"))) > > +(define_predicate "const_1_to_63_operand" > + (and (match_code "const_int") > + (match_test "IN_RANGE (INTVAL (op), 0, 63)"))) > + The name doesn't seem to match the definition: the lower bound is 0 rather than 1. > (define_predicate "const_0_to_4_step_4_operand" > (and (match_code "const_int") > (match_test "IN_RANGE (INTVAL (op), 0, 4)") > @@ -82,6 +86,8 @@ (define_predicate "subreg_lowpart_operator" > (and (match_code "subreg") > (match_test "subreg_lowpart_p (op)")))) > > + > + Sorry for the nit, but: excess whitespace. Overall, the series looks really good, thanks! Richard