On 08/15/2018 10:49 PM, Dimitar Dimitrov wrote: > ChangeLog: > > 2018-07-27 Dimitar Dimitrov <dimi...@dinux.eu> > > * configure: Regenerate. > * configure.ac: Add PRU target. > > gcc/ChangeLog: > > 2018-07-27 Dimitar Dimitrov <dimi...@dinux.eu> > > * common/config/pru/pru-common.c: New file. > * config.gcc: Add PRU target. > * config/pru/alu-zext.md: New file. > * config/pru/constraints.md: New file. > * config/pru/predicates.md: New file. > * config/pru/pru-opts.h: New file. > * config/pru/pru-passes.c: New file. > * config/pru/pru-pragma.c: New file. > * config/pru/pru-protos.h: New file. > * config/pru/pru.c: New file. > * config/pru/pru.h: New file. > * config/pru/pru.md: New file. > * config/pru/pru.opt: New file. > * config/pru/t-pru: New file. > * doc/extend.texi: Document PRU pragmas. > * doc/invoke.texi: Document PRU-specific options. > * doc/md.texi: Document PRU asm constraints.
> > Signed-off-by: Dimitar Dimitrov <dimi...@dinux.eu> > > diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md > new file mode 100644 > index 00000000000..2112b08d3f4 > --- /dev/null > +++ b/gcc/config/pru/alu-zext.md [ ... ] I wonder if the zero-extensions in alu-zext.md are really needed. Can we define WORD_REGISTER_OPERATIONS for PRU? Or do you really want to expose sub-word operations? I simply don't know enough about the hardware to provide guidance here. But it might allow for some simplification of the port. You wouldn't want to define sub-word patterns such as addqi/addhi anymore either. > diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c > new file mode 100644 > index 00000000000..de72d22278e > --- /dev/null > +++ b/gcc/config/pru/pru.c > @@ -0,0 +1,3001 @@ > +/* Emit efficient RTL equivalent of ADD3 with the given const_int for > + frame-related registers. > + op0 - Destination register. > + op1 - First addendum operand (a register). > + addendum - Second addendum operand (a constant). > + kind - Note kind. REG_NOTE_MAX if no note must be added. > + reg_note_rtx - Reg note RTX. NULL if it should be computed > automatically. > + */ > +static rtx > +pru_add3_frame_adjust (rtx op0, rtx op1, int addendum, > + const enum reg_note kind, rtx reg_note_rtx) > +{ > + rtx insn; > + > + rtx op0_adjust = gen_rtx_SET (op0, plus_constant (Pmode, op1, addendum)); > + > + if (UBYTE_INT (addendum) || UBYTE_INT (-addendum)) > + insn = emit_insn (op0_adjust); > + else > + { > + /* Help the compiler to cope with an arbitrary integer constant. > + Reload has finished so we can't expect the compiler to > + auto-allocate a temporary register. But we know that call-saved > + registers are not live yet, so we utilize them. */ Note I think this can run afoul of IPA-RA. THough it looks like you're exposing this in RTL, so you're probably safe. > +/* Emit function epilogue. */ > +void > +pru_expand_epilogue (bool sibcall_p) > +{ > + rtx cfa_adj; > + int total_frame_size; > + int sp_adjust, save_offset; > + int regno_start; > + > + if (!sibcall_p && pru_can_use_return_insn ()) > + { > + emit_jump_insn (gen_return ()); > + return; > + } > + > + emit_insn (gen_blockage ()); > + > + total_frame_size = pru_compute_frame_layout (); > + if (frame_pointer_needed) > + { > + /* Recover the stack pointer. */ > + cfa_adj = plus_constant (Pmode, stack_pointer_rtx, > + (total_frame_size > + - cfun->machine->save_regs_offset)); > + pru_add3_frame_adjust (stack_pointer_rtx, > + hard_frame_pointer_rtx, > + -cfun->machine->fp_save_offset, > + REG_CFA_DEF_CFA, > + cfa_adj); > + > + save_offset = 0; > + sp_adjust = total_frame_size - cfun->machine->save_regs_offset; > + } > + else if (!UBYTE_INT (total_frame_size)) > + { > + pru_add_to_sp (cfun->machine->save_regs_offset, > + REG_CFA_ADJUST_CFA); > + save_offset = 0; > + sp_adjust = total_frame_size - cfun->machine->save_regs_offset; > + } > + else > + { > + save_offset = cfun->machine->save_regs_offset; > + sp_adjust = total_frame_size; > + } > + > + regno_start = 0; > + do { > + regno_start = xbbo_next_reg_cluster (regno_start, &save_offset, false); > + } while (regno_start >= 0); > + > + if (sp_adjust) > + pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA); > + > + if (!sibcall_p) > + emit_jump_insn (gen_simple_return ()); Note you probably need a blockage before the restore of the stack pointer to avoid some really nasty problems with the scheduler reordering things such that the current frame gets deallocated before register restores happen. It usually doesn't cause a problem, but if an interrupt occurs between the de-allocation and register restore, then all bets are off. > > diff --git a/gcc/config/pru/pru.h b/gcc/config/pru/pru.h > new file mode 100644 > index 00000000000..de1b9100209 > --- /dev/null > +++ b/gcc/config/pru/pru.h [ ... ] > + > +#ifdef IN_LIBGCC2 > +/* This is to get correct SI and DI modes in libgcc2.c (32 and 64 bits). */ > +#define UNITS_PER_WORD 4 > +#else > +/* Width of a word, in units (bytes). */ > +#define UNITS_PER_WORD 1 > +#endif This looks odd. Can you explain a bit more what you're doing here? I'm guessing you took it from AVR which is an 8 bit port. I'm not sure it's the right thing to do for AVR either :-) > diff --git a/gcc/config/pru/pru.md b/gcc/config/pru/pru.md > new file mode 100644 > index 00000000000..4be1958e6c9 > --- /dev/null > +++ b/gcc/config/pru/pru.md > @@ -0,0 +1,956 @@ > +;; Machine Description for TI PRU. > +;; Copyright (C) 2014-2018 Free Software Foundation, Inc. > +;; Contributed by Dimitar Dimitrov <dimi...@dinux.eu> > +;; Based on the NIOS2 GCC port. > +;; > +;; This file is part of GCC. > +;; > +;; GCC is free software; you can redistribute it and/or modify > +;; it under the terms of the GNU General Public License as published by > +;; the Free Software Foundation; either version 3, or (at your option) > +;; any later version. > +;; > +;; GCC is distributed in the hope that it will be useful, > +;; but WITHOUT ANY WARRANTY; without even the implied warranty of > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +;; GNU General Public License for more details. > +;; > +;; You should have received a copy of the GNU General Public License > +;; along with GCC; see the file COPYING3. If not see > +;; <http://www.gnu.org/licenses/>. > + > + > + > +;; Split loading of integer constants into a distinct pattern, in order to > +;; prevent CSE from considering "SET (MEM, CONST_INT)" as a valid insn > +;; selection. This fixes an abnormally long compile time exposed by > +;; gcc.dg/pr48141.c Note that in general both reload and LRA require that the movXX patterns be unified for any given mode. Reload and LRA can change the operands without re-recognizing the pattern. You might be getting away with not adhering to that requirement because you're just dealing with constant source operands (most of the problems I've run into through the years involve memory operands). But be aware you may need to twiddle this and fold those patterns back into your movXX patterns. > + > +; I cannot think of any reason for the core to pass a 64-bit symbolic > +; constants. Hence simplify the rule and handle only numeric constants. That's fine. THough you'll generally get better code generated if you tighten up the operand predicates rather than handling things in contraints. So it may be worth investigating if you can tighten up the predicate for your movXX patterns. And that advice applies in general. It's usually better to have a predicate that is tight and use constraints to drive instruction selection. A wide predicate and tight constraints means reload/LRA has to work harder and they're not particularly good at optimizing away redundancies they create. Additionally on a target which defines instruction scheduling, you generally get a more accurate first schedule if the predicates are tight. > + > +;; Sign extension patterns. We have to emulate them due to lack of > +;; signed operations in PRU's ALU. > + > +(define_insn "extend<EQS0:mode><EQD:mode>2" > + [(set (match_operand:EQD 0 "register_operand" "=r") > + (sign_extend:EQD (match_operand:EQS0 1 "register_operand" "r")))] > + "" > + { > + return pru_output_sign_extend (operands); > + } > + [(set_attr "type" "complex") > + (set_attr "length" "12")]) You might be better off expanding sign extension as a pair of shifts. Those can be optimized and combined with other nearby operations. ISTM that's something you could experiment with once the port is accepted. > +;; Arithmetic Operations > + > +(define_expand "add<mode>3" > + [(set (match_operand:QISI 0 "register_operand" "=r,r,r") > + (plus:QISI (match_operand:QISI 1 "register_operand" "%r,r,r") > + (match_operand:QISI 2 "nonmemory_operand" "r,I,M")))] > + "" > + "" > + [(set_attr "type" "alu")]) > + > +(define_insn "adddi3" > + [(set (match_operand:DI 0 "register_operand" "=&r,&r,&r") > + (plus:DI (match_operand:DI 1 "register_operand" "%r,r,r") > + (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,M")))] > + "" > + "@ > + add\\t%F0, %F1, %F2\;adc\\t%N0, %N1, %N2 > + add\\t%F0, %F1, %2\;adc\\t%N0, %N1, 0 > + sub\\t%F0, %F1, %n2\;suc\\t%N0, %N1, 0" > + [(set_attr "type" "alu,alu,alu") > + (set_attr "length" "8,8,8")]) Note that if you expose the C bit and define an add/sub with carry insn the compiler will handle this kind of stuff for you automatically. That's something you can experiment with after acceptance as well. Overall it looks pretty good. I think the blockage in the epilogue issue needs to be addressed. The UNITS_PER_WORD concern needs to be answered. The rest are things you might want to investigate, but aren't things that need to be changed to get approval. I think Joseph had some concerns about diagnostics emitted by the PRU target code. Whatever those where need to be addressed if they haven't already. Jeff