> On 11/18/2014 11:48 AM, Yangfei (Felix) wrote: > > +(define_expand "doloop_end" > > + [(use (match_operand 0 "" "")) ; loop pseudo > > + (use (match_operand 1 "" ""))] ; label > > + "" > > +{ > > + /* Currently SMS relies on the do-loop pattern to recognize loops > > + where (1) the control part consists of all insns defining and/or > > + using a certain 'count' register and (2) the loop count can be > > + adjusted by modifying this register prior to the loop. > > + ??? The possible introduction of a new block to initialize the > > + new IV can potentially affect branch optimizations. */ > > + if (optimize > 0 && flag_modulo_sched) > > You'd be better off moving this condition into the expansion predicate (which > is > currently ""). > > This short-circuits a lot of unnecessary work. See pass_rtl_doloop::gate. > > > r~
Yeah, that's a good idea. See my updated patch :-) Index: gcc/ChangeLog =================================================================== --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang <felix.y...@huawei.com> + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + 2014-11-11 Andrew Pinski <apin...@cavium.com> Bug target/61997 Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,43 @@ [(set_attr "type" "mrs")]) +;; Define the subtract-one-and-jump insns so loop.c +;; knows what to generate. +(define_expand "doloop_end" + [(use (match_operand 0 "" "")) ; loop pseudo + (use (match_operand 1 "" ""))] ; label + "optimize > 0 && flag_modulo_sched" +{ + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + + if (GET_MODE (operands[0]) != DImode) + FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, + loc_ref, pc_rtx))); + DONE; +}) + ;; AdvSIMD Stuff (include "aarch64-simd.md") Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c (revision 217394) +++ gcc/config/aarch64/aarch64.c (working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h"
aarch64-doloop-v5.diff
Description: aarch64-doloop-v5.diff