match_scratch causing pattern mismatch
in a GCC port to a 16 bit cpu that uses CC flags for branching, I'm experimenting with using a 32 bit subtract for compare instead of multiple 16 bit compares and branches. my cbranch4 expander produces a compare and conditional branch patterns... cmpmode = SELECT_CC_MODE( branchCode, op0, op1 ); flags = gen_rtx_REG ( cmpmode, CC_REGNUM ); compare = gen_rtx_COMPARE ( cmpmode, op0, op1 ); emit_insn( gen_rtx_SET( VOIDmode, flags, compare )); To implement compare using a subtract I need a HI mode scratch register, so I used a match_scratch (define_insn "comparesi3" [ (set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 0 "register_operand" "r,r") (match_operand:SI 1 "rhs_operand" "r,i"))) (clobber(match_scratch:HI 2 "=r,r")) ] "" When I do this, the compare no longer matches and I get failures like this in the vregs pass... ../../../libgcc/unwind-dw2.c:1224:1: error: unrecognizable insn: } ^ (insn 69 68 70 7 (set (reg:CC 16 flags) (compare:CC (reg:SI 44 [ D.5851 ]) (reg:SI 169))) ../../../libgcc/unwind-dw2.c:972 -1 (nil)) when I remove the match_scratch these errors disappear, but of course I don't have the scratch register needed to implement the proper assembler instructions I'm aware that it's the combiner that understands clobbers etc. So, in the .md file I tried to add a dummy comparesi3 pattern that doesn't have the match_scratch... after the pattern containing the match_scratch. This sometimes works, however on occasion the dummy pattern is selected by the combiner instead of the match_scratch pattern . Any insight appreciated... Cheers, Paul
Re: match_scratch causing pattern mismatch
Of course, the answer is to emit_insn( gen_comparesi3( op0, op1 )); which generates the required match_scratch instead of ... cmpmode = SELECT_CC_MODE( branchCode, op0, op1 ); flags = gen_rtx_REG ( cmpmode, CC_REGNUM ); compare = gen_rtx_COMPARE ( cmpmode, op0, op1 ); emit_insn( gen_rtx_SET( VOIDmode, flags, compare )); Sorry for the bother... On 31/07/15 08:39, Paul Shortis wrote: in a GCC port to a 16 bit cpu that uses CC flags for branching, I'm experimenting with using a 32 bit subtract for compare instead of multiple 16 bit compares and branches. my cbranch4 expander produces a compare and conditional branch patterns... cmpmode = SELECT_CC_MODE( branchCode, op0, op1 ); flags = gen_rtx_REG ( cmpmode, CC_REGNUM ); compare = gen_rtx_COMPARE ( cmpmode, op0, op1 ); emit_insn( gen_rtx_SET( VOIDmode, flags, compare )); To implement compare using a subtract I need a HI mode scratch register, so I used a match_scratch (define_insn "comparesi3" [ (set (reg:CC CC_REGNUM) (compare:CC (match_operand:SI 0 "register_operand" "r,r") (match_operand:SI 1 "rhs_operand" "r,i"))) (clobber(match_scratch:HI 2 "=r,r")) ] "" When I do this, the compare no longer matches and I get failures like this in the vregs pass... ../../../libgcc/unwind-dw2.c:1224:1: error: unrecognizable insn: } ^ (insn 69 68 70 7 (set (reg:CC 16 flags) (compare:CC (reg:SI 44 [ D.5851 ]) (reg:SI 169))) ../../../libgcc/unwind-dw2.c:972 -1 (nil)) when I remove the match_scratch these errors disappear, but of course I don't have the scratch register needed to implement the proper assembler instructions I'm aware that it's the combiner that understands clobbers etc. So, in the .md file I tried to add a dummy comparesi3 pattern that doesn't have the match_scratch... after the pattern containing the match_scratch. This sometimes works, however on occasion the dummy pattern is selected by the combiner instead of the match_scratch pattern . Any insight appreciated... Cheers, Paul
Controlling instruction alternative selection
I'm working with a CPU having a restricted set of registers that can do three address maths wheres ALL registers can do two address maths. If I define (define_insn "addsi3" [ (set (match_operand:SI 0 "register_operand" "=r,r") (plus:SI (match_operand:SI 1 "register_operand" "0,0") (match_operand:SI 2 "rhs_operand" "r,i"))) So that all adds are done using 2 address instructions then all is fine. If however I change addsi3 to (where the constraint 'R' is the smaller set of three address registers) (define_insn "addsi3" [ (set (match_operand:SI 0 "register_operand" "=R,r,r") (plus:SI (match_operand:SI 1 "register_operand" "R,0,0") (match_operand:SI 2 "rhs_operand" "R,r,i"))) to take advantage of the three address instructions then the three address instructions are used successfully on many occassions. However when register pressure on the 'R' class is high the allocater never falls back to using the entire register set by employing the two address instructions. Resulting in ... error: unable to find a register to spill in class ‘GP_REGS’ enabling lra and inspecting the rtl dump indicates that both alternatives (R and r) seem to be equally appealing to the allocater so it chooses 'R' and fails. GCC internals document indicates that the '0' alternates should be placed at the end of the alternatives list, so I'm guessing 'R' will always be chosen. Using constraint disparaging (?R) eradicates the errors, but of course that causes the 'R' three address alternative to never be used. Suggestions ?
Re: Controlling instruction alternative selection
Thanks Jim, It took me a while to get back to this problem but as you suggested, exposing only the three address instructions prior to reload... in conjunction with implementing TARGET_CLASS_LIKELY_SPILLED_P has produced good results. After taking these measures I didn't have to disparage any alternatives in the patterns. Paul. On 04/08/15 06:00, Jim Wilson wrote: On 07/30/2015 09:54 PM, Paul Shortis wrote: Resulting in ... error: unable to find a register to spill in class ‘GP_REGS’ enabling lra and inspecting the rtl dump indicates that both alternatives (R and r) seem to be equally appealing to the allocater so it chooses 'R' and fails. The problem isn't in lra, it is in reload. You want lra to use the three address instruction, but you then want reload to use the two address alternative. Using constraint disparaging (?R) eradicates the errors, but of course that causes the 'R' three address alternative to never be used. You want to disparage the three address alternative in reload, but not in lra. There is a special code for that, you can use ^ instead of ? to make that happen. That may or may not help though. There is also a hook TARGET_CLASS_LIKELY_SPILLED_P which might help. You should try defining this to return true for the 'R' class if it doesn't already. Jim
bug in lra causes incorrect register usage / compiler crash
I'm porting gcc to a 16 bit cpu with a two address ISA like x86. When using LRA I was getting compiler crashes and results like this from the reload pass (insn 97 96 98 21 (parallel [ (set (reg/f:HI 3 r3 [59]) (plus:HI (reg/f:HI 6 sp) (const_int 4 [0x4]))) (clobber (reg:CC 7 flags)) ]) ../../../libgcc/unwind-dw2-fde.c:304 25 {addhi3} (expr_list:REG_EQUIV (plus:HI (reg/f:HI 3 r3) (const_int 4 [0x4])) (nil))) which is invalid, because like x86 the cpu only has instructions of the form Rn op= operand(e.g add r3,4). The md pattern for this instruction is... (define_insn "hi3" [ (set (match_operand:HI 0 "register_operand" "=r,r,r") (commOperator:HI (match_operand:HI 1 "register_operand" "0,0,0") (match_operand:HI 2 "rhs_operand" "r,i,m"))) looking back at the ira output I could see that the input to the LRA was (insn 97 96 98 18 (parallel [ (set (reg/f:HI 59) (plus:HI (reg/f:HI 3 r3) (const_int 4 [0x4]))) (clobber (reg:CC 7 flags)) ]) ../../../libgcc/unwind-dw2-fde.c:304 25 {addhi3} (expr_list:REG_UNUSED (reg:CC 7 flags) (expr_list:REG_EQUIV (plus:HI (reg/f:HI 3 r3) (const_int 4 [0x4])) R3 is the frame pointer which is eliminated to sp by the LRA. Then r3 is allocated to virtual R59. Tracing through LRA constraints processing I found the problem to be get_hard_regno (rtx x) which translates the virtual register 59 to R3 then erroneously passes this to get_final_hard_regno() which eliminates R3 to sp causing constraints to be (incorrectly) satisfied. At final register elimination this problem doesn't occur, as there is a check before elimination to confirm that a register to be eliminated is actually a hard register. However the get_hard_regno() bug does cause incorrect constraint checking leading to invalid RTL generation, forming an impossible instruction. The same circumstances exist when using the LRA for the i386. I'm guessing it hasn't raised its head because -fomit-frame-pointer is off by default for the i386 port. I don't have write access to the repository so below is a patch for the change I used to address this problem. Cheers, Paul. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index aac5087..81817f2 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -197,6 +197,7 @@ get_hard_regno (rtx x) { rtx reg; int offset, hard_regno; + bool is_virtual = false; reg = x; if (GET_CODE (x) == SUBREG) @@ -204,14 +205,18 @@ get_hard_regno (rtx x) if (! REG_P (reg)) return -1; if ((hard_regno = REGNO (reg)) >= FIRST_PSEUDO_REGISTER) + { +is_virtual = true; hard_regno = lra_get_regno_hard_regno (hard_regno); + } if (hard_regno < 0) return -1; offset = 0; if (GET_CODE (x) == SUBREG) offset += subreg_regno_offset (hard_regno, GET_MODE (reg), SUBREG_BYTE (x), GET_MODE (x)); - return get_final_hard_regno (hard_regno, offset); + return is_virtual ? hard_regno + offset +: get_final_hard_regno (hard_regno, offset); }
Re: bug in lra causes incorrect register usage / compiler crash
I've now confirmed this same issue occurs on a stock i386 build when -fomit-frame-pointer is specified with -O2 and a test case with reasonable register pressure. On 28/04/14 07:47, Paul Shortis wrote: I'm porting gcc to a 16 bit cpu with a two address ISA like x86. When using LRA I was getting compiler crashes and results like this from the reload pass (insn 97 96 98 21 (parallel [ (set (reg/f:HI 3 r3 [59]) (plus:HI (reg/f:HI 6 sp) (const_int 4 [0x4]))) (clobber (reg:CC 7 flags)) ]) ../../../libgcc/unwind-dw2-fde.c:304 25 {addhi3} (expr_list:REG_EQUIV (plus:HI (reg/f:HI 3 r3) (const_int 4 [0x4])) (nil))) which is invalid, because like x86 the cpu only has instructions of the form Rn op= operand(e.g add r3,4). The md pattern for this instruction is... (define_insn "hi3" [ (set (match_operand:HI 0 "register_operand" "=r,r,r") (commOperator:HI (match_operand:HI 1 "register_operand" "0,0,0") (match_operand:HI 2 "rhs_operand" "r,i,m"))) looking back at the ira output I could see that the input to the LRA was (insn 97 96 98 18 (parallel [ (set (reg/f:HI 59) (plus:HI (reg/f:HI 3 r3) (const_int 4 [0x4]))) (clobber (reg:CC 7 flags)) ]) ../../../libgcc/unwind-dw2-fde.c:304 25 {addhi3} (expr_list:REG_UNUSED (reg:CC 7 flags) (expr_list:REG_EQUIV (plus:HI (reg/f:HI 3 r3) (const_int 4 [0x4])) R3 is the frame pointer which is eliminated to sp by the LRA. Then r3 is allocated to virtual R59. Tracing through LRA constraints processing I found the problem to be get_hard_regno (rtx x) which translates the virtual register 59 to R3 then erroneously passes this to get_final_hard_regno() which eliminates R3 to sp causing constraints to be (incorrectly) satisfied. At final register elimination this problem doesn't occur, as there is a check before elimination to confirm that a register to be eliminated is actually a hard register. However the get_hard_regno() bug does cause incorrect constraint checking leading to invalid RTL generation, forming an impossible instruction. The same circumstances exist when using the LRA for the i386. I'm guessing it hasn't raised its head because -fomit-frame-pointer is off by default for the i386 port. I don't have write access to the repository so below is a patch for the change I used to address this problem. Cheers, Paul. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index aac5087..81817f2 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -197,6 +197,7 @@ get_hard_regno (rtx x) { rtx reg; int offset, hard_regno; + bool is_virtual = false; reg = x; if (GET_CODE (x) == SUBREG) @@ -204,14 +205,18 @@ get_hard_regno (rtx x) if (! REG_P (reg)) return -1; if ((hard_regno = REGNO (reg)) >= FIRST_PSEUDO_REGISTER) + { +is_virtual = true; hard_regno = lra_get_regno_hard_regno (hard_regno); + } if (hard_regno < 0) return -1; offset = 0; if (GET_CODE (x) == SUBREG) offset += subreg_regno_offset (hard_regno, GET_MODE (reg), SUBREG_BYTE (x), GET_MODE (x)); - return get_final_hard_regno (hard_regno, offset); + return is_virtual ? hard_regno + offset +: get_final_hard_regno (hard_regno, offset); }
Compare Elimination problems
For a 16 bit CPU the cmpelim pass is changing (insn 33 84 85 6 (parallel [ (set (reg:HI 1 r1) (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1]))) (clobber (reg:CC_NOOV 7 flags)) ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 33 {ashlhi3} (insn 34 87 35 6 (set (reg:CC_NOOV 7 flags) (compare:CC_NOOV (reg:SI 0 r0) (const_int 0 [0]))) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:20 39 {*comparesi3_nov} (jump_insn 35 34 36 6 (set (pc) (if_then_else (ge (reg:CC_NOOV 7 flags) to (insn 33 84 85 6 (parallel [ (set (reg:HI 1 r1) (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1]))) (set (reg:CC_NOOV 7 flags) (compare:CC_NOOV (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1])) (const_int 0 [0]))) ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 29 {ashlhi3_cc} (jump_insn 35 87 36 6 (set (pc) (if_then_else (ge (reg:CC_NOOV 7 flags) (reg:HI r1) is a subreg of (reg:SI r0) however the cmpelim seems to be substituting the compare of (reg:HI r1 and 0) for the compare of (reg:SI r0 and 0) ? While I'm here, in i386.md some of the flag setting operations specify a mode and some don't . Eg (define_expand "cmp_1" [(set (reg:CC FLAGS_REG) (compare:CC (match_operand:SWI48 0 "nonimmediate_operand") (define_insn "*add_3" [(set (reg FLAGS_REG) (compare Can anyone explain the significance of this ? Thanks, Paul.
Re: Compare Elimination problems
Thanks Richard, I found the bug. try_eliminate_compare follows register definitions between the flags use and the clobbering compare by register number only, i.e. the register width isn't considered. if (DF_REF_REGNO (def) == REGNO (in_a)) break; This caused R1:HI to follow R0:SI causing the compare:HI (R0, so be seen as as valid source of flags. The following change fixes the problem while preserving the proper behaviour. x = single_set (insn); if (x == NULL) return false; + if( GET_MODE(x) != GET_MODE(in_a)) + return false; in_a = SET_SRC (x); Cheers, Paul. On 05/09/14 02:33, Richard Henderson wrote: On 09/03/2014 03:14 PM, Paul Shortis wrote: (insn 33 84 85 6 (parallel [ (set (reg:HI 1 r1) (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1]))) (clobber (reg:CC_NOOV 7 flags)) ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 33 {ashlhi3} (insn 34 87 35 6 (set (reg:CC_NOOV 7 flags) (compare:CC_NOOV (reg:SI 0 r0) (const_int 0 [0]))) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:20 39 {*comparesi3_nov} (jump_insn 35 34 36 6 (set (pc) (if_then_else (ge (reg:CC_NOOV 7 flags) to (insn 33 84 85 6 (parallel [ (set (reg:HI 1 r1) (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1]))) (set (reg:CC_NOOV 7 flags) (compare:CC_NOOV (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1])) (const_int 0 [0]))) ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 29 {ashlhi3_cc} (jump_insn 35 87 36 6 (set (pc) (if_then_else (ge (reg:CC_NOOV 7 flags) (reg:HI r1) is a subreg of (reg:SI r0) however the cmpelim seems to be substituting the compare of (reg:HI r1 and 0) for the compare of (reg:SI r0 and 0) ? That would appear to be a bug, though I don't immediately see where. The relevant test would appear to be if (rtx_equal_p (SET_DEST (x), in_a)) which would not be true for (reg:SI) vs (reg:HI), whatever their regno's. Anyway, put your breakpoint in try_eliminate_compare to debug this. While I'm here, in i386.md some of the flag setting operations specify a mode and some don't . Eg (define_expand "cmp_1" [(set (reg:CC FLAGS_REG) (compare:CC (match_operand:SWI48 0 "nonimmediate_operand") That's a define_expand, not a define_insn. (define_insn "*add_3" [(set (reg FLAGS_REG) (compare The mode checks are done in the ix86_match_ccmode call inside the predicate. r~
Re: Compare Elimination problems
Correction, ...compare:CC_N... (R1:HI, 0) On 05/09/14 09:31, Paul Shortis wrote: Thanks Richard, I found the bug. try_eliminate_compare follows register definitions between the flags use and the clobbering compare by register number only, i.e. the register width isn't considered. if (DF_REF_REGNO (def) == REGNO (in_a)) break; This caused R1:HI to follow R0:SI causing the compare:HI (R0, so be seen as as valid source of flags. The following change fixes the problem while preserving the proper behaviour. x = single_set (insn); if (x == NULL) return false; + if( GET_MODE(x) != GET_MODE(in_a)) + return false; in_a = SET_SRC (x); Cheers, Paul. On 05/09/14 02:33, Richard Henderson wrote: On 09/03/2014 03:14 PM, Paul Shortis wrote: (insn 33 84 85 6 (parallel [ (set (reg:HI 1 r1) (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1]))) (clobber (reg:CC_NOOV 7 flags)) ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 33 {ashlhi3} (insn 34 87 35 6 (set (reg:CC_NOOV 7 flags) (compare:CC_NOOV (reg:SI 0 r0) (const_int 0 [0]))) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:20 39 {*comparesi3_nov} (jump_insn 35 34 36 6 (set (pc) (if_then_else (ge (reg:CC_NOOV 7 flags) to (insn 33 84 85 6 (parallel [ (set (reg:HI 1 r1) (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1]))) (set (reg:CC_NOOV 7 flags) (compare:CC_NOOV (ashift:HI (reg:HI 1 r1) (const_int 1 [0x1])) (const_int 0 [0]))) ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 29 {ashlhi3_cc} (jump_insn 35 87 36 6 (set (pc) (if_then_else (ge (reg:CC_NOOV 7 flags) (reg:HI r1) is a subreg of (reg:SI r0) however the cmpelim seems to be substituting the compare of (reg:HI r1 and 0) for the compare of (reg:SI r0 and 0) ? That would appear to be a bug, though I don't immediately see where. The relevant test would appear to be if (rtx_equal_p (SET_DEST (x), in_a)) which would not be true for (reg:SI) vs (reg:HI), whatever their regno's. Anyway, put your breakpoint in try_eliminate_compare to debug this. While I'm here, in i386.md some of the flag setting operations specify a mode and some don't . Eg (define_expand "cmp_1" [(set (reg:CC FLAGS_REG) (compare:CC (match_operand:SWI48 0 "nonimmediate_operand") That's a define_expand, not a define_insn. (define_insn "*add_3" [(set (reg FLAGS_REG) (compare The mode checks are done in the ix86_match_ccmode call inside the predicate. r~
limiting call clobbered registers for library functions
I've ported GCC to a small 16 bit CPU that has single bit shifts. So I've handled variable / multi-bit shifts using a mix of inline shifts and calls to assembler support functions. The calls to the asm library functions clobber only one (by const) or two (variable) registers but of course calling these functions causes all of the standard call clobbered registers to be considered clobbered, thus wasting lots of candidate registers for use in expressions surrounding these shifts and causing unnecessary register saves in the surrounding function prologue/epilogue. I've scrutinized and cloned the actions of other ports that do the same, however I'm unable to convince the various passes that only r1 and r2 can be clobbered by these library calls. Is anyone able to point me in the proper direction for a solution to this problem ? Thanks, Paul.
Re: limiting call clobbered registers for library functions
On 02/02/15 18:55, Yury Gribov wrote: On 01/30/2015 11:16 AM, Matthew Fortune wrote: Yury Gribov writes: On 01/29/2015 08:32 PM, Richard Henderson wrote: On 01/29/2015 02:08 AM, Paul Shortis wrote: I've ported GCC to a small 16 bit CPU that has single bit shifts. So I've handled variable / multi-bit shifts using a mix of inline shifts and calls to assembler support functions. The calls to the asm library functions clobber only one (by const) or two (variable) registers but of course calling these functions causes all of the standard call clobbered registers to be considered clobbered, thus wasting lots of candidate registers for use in expressions surrounding these shifts and causing unnecessary register saves in the surrounding function prologue/epilogue. I've scrutinized and cloned the actions of other ports that do the same, however I'm unable to convince the various passes that only r1 and r2 can be clobbered by these library calls. Is anyone able to point me in the proper direction for a solution to this problem ? You wind up writing a pattern that contains a call, but isn't represented in rtl as a call. Could it be useful to provide a pragma for specifying function register usage? This would allow e.g. library writer to write a hand-optimized assembly version and then inform compiler of it's binary interface. Currently a surrogate of this can be achieved by putting inline asm code in static inline functions in public library headers but this has it's own disadvantages (e.g. code bloat). This sounds like a good idea in principle. I seem to recall seeing something similar to this in other compiler frameworks that allow a number of special calling conventions to be defined and enable functions to be attributed to use one of them. I.e. not quite so general as specifying an arbitrary clobber list but some sensible pre-defined alternative conventions. FYI a colleague from kernel mentioned that they already achieve this by wrapping the actual call with inline asm e.g. static inline int foo(int x) { asm( ".global foo_core\n" // foo_core accepts single parameter in %rax, // returns result in %rax and // clobbers %rbx "call foo_core\n" : "+a"(x) : : "rbx" ); return x; } We still can't mark inline asm with things like __attribute__((pure)), etc. though so it's not an ideal solution. -Y Thanks everyone. I've finally settled on an extension of the solution offered by Richard(from the SH port) I've had to ... 1.write an expander that expands ... a)short (bit count) constant shifts to an instruction pattern that emits asm for one or more inline shifts b)other constant shifts to another instruction pattern that emits asm for a libary call and clobbers cc c)variable shifts to another instruction pattern that emits asm for a libary call and clobbers ccplus r2 then for compare elimination two other instruction patters corresponding to b) and c) that set CC from a compare instead of clobbering it. I could have avoided the expander and used a single instruction pattern for a)b)c) if if could have found a way to have alternative dependent clobbers in an instruction pattern. I investigated attributes but couldn't see how I would be able to achieve what I needed. Also tried clobber (match_dup 2) but when one of the alternatives has a constant for operands[2] the clobber is accepted silently by the .md compiler but doesn't actually clobber the non-constant alternatives. A mechanism to implement alternative dependent clobbers would have allowed all this to be represented in a much more succinct and compact manner. On another note... compare elimination didn't work for pattern c) and on inspection I found this snippet in compare-elim.c static bool arithmetic_flags_clobber_p (rtx insn) { ... ... if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) ==2) { which of course rejects any parallel pattern with a main rtx and more than one clobber (i.e. (c) above). So I changed this function so that it accepts patterns where rtx's one and upwards are all clobbers, one being cc. The resulting generated asm ... ld r2,r4; r2 holds the shift count, it will be clobbered to calculate an index into the sequence of shift instructions call__ashl_v; variable ashl beq .L4; branch on result ld r1,r3 If anyone can see any fault in this change please call out Of course, the problem with using inline asm is that you have to arrange for them to be included in EVERY compile and they don't allow compare elimination to be easily implemented. Paul.
Re: limiting call clobbered registers for library functions
On 03/02/15 09:14, Joern Rennecke wrote: On 2 February 2015 at 21:54, Paul Shortis wrote: I could have avoided the expander and used a single instruction pattern for a)b)c) if if could have found a way to have alternative dependent clobbers in an instruction pattern. I investigated attributes but couldn't see how I would be able to achieve what I needed. Also tried clobber (match_dup 2) but when one of the alternatives has a constant for operands[2] the clobber is accepted silently by the .md compiler but doesn't actually clobber the non-constant alternatives. You can clobber one or more match_scratch with suitable constraint alternatives and let the register allocator / reload reserve the hard registers. Well, not really for cc, you'll just have to say you clobber it, and split away the unused clobber post-reload. That will not really give you less source code, but more uniform patterns prior to register allocation. That gives the rtl optimizers a better handle on the code. The effect is even more pronounced when you replace all hard register usage with pseudo register usage and appropriate constraints. Thanks Joern, That was one of the (many) things I had tried, but I couldn't get it to work & moved on. Your post caused to to try again, and it worked perfectly and I now need just a single pattern and no expander. I do howeverhave one remaining issue. The pattern I'm using is ... (define_insn "hi3" [(set (match_operand:WORD 0 "register_operand" "=r,C,C") (hiShiftOperator:WORD (match_operand:WORD 1 "register_operand" "0,0,0") (match_operand:WORD 2 "rhs_operand" "M,i,D"))) (clobber (reg:CC_NOOV CC_REGNUM)) (clobber (match_scratch:HI 3 "=X,X,D")) ] "" "..." [(set_attr "length" "2,8,8")] ) the r/M constraint pair is for shifts small enough to perform in line ( M == const_int 1...3) register constraints C and D are for the registers used for the library calls At first I used the first two registers normally used for function calls (r1 and r2), however when building the libgcc et al I occasionally encounter the situation where a register in the R1_CLASS (constraint C) can't be allocated. I changed these two registers to the two last GPRs in the allocation order and made them call clobbered and I can at least compile libgcc and newlib. Could I be doing something better, or is this expected behaviour ? Paul.
Unable to match instruction pattern
I'm porting gcc to a 16 bit processor. Occasionally compiling source such as short v1; long global; global = (long)v1; results in ... (insn 11 10 12 2 (set (subreg:HI (mem/c:SI (symbol_ref:HI ("global") ) [2 global+0 S4 A16]) 2) (const_int 0 [0])) t.c:15 -1 (nil)) (insn 12 11 13 2 (set (subreg:HI (mem/c:SI (symbol_ref:HI ("global") ) [2 global+0 S4 A16]) 0) (reg:HI 24 [ D.1348 ])) t.c:15 -1 (nil)) The second of these instructions successfully matches a movhi pattern which is found via the define_expand below ;the INTEGER iterator includes [QI HI SI] (define_expand "mov" [(set (match_operand:INTEGER 0 "nonimmediate_operand" "") (match_operand:INTEGER 1 "general_operand" ""))] "" { if( can_create_pseudo_p() ) { if( !register_operand( operands[0], mode ) && !register_operand( operands[1], mode )) { /* one of the operands must be in a register. */ operands[1] = copy_to_mode_reg (mode, operands[1]); } } However the first instruction fails to match and be expanded so that the const_int is forced into a register. I defined an instruction which is an exact match (define_insn "*storesic" [ (set (subreg:HI (mem:SI (match_operand:HI 0 "symbol_ref_operand" "")) 2) (match_operand:HI 1 "const_int_operand")) ] This matches, however I need a scratch register (R? below) to transform this into loadR?, const_int_operand store R?, address and as soon as I add the match_scratch (clobber(match_scratch:HI 2 "=r")) the pattern no longer matches because expand calls recog() with pnum_clobbers = 0 (NULL) I a similar vein to the way CC setting instructions are handled I tried defining a second instruction containing the required match_scratch prior to the define_insn "*storesic" (see above) in the hope that it may match subsequent to the expand pass, however it is never seen. Can anyone offer any insight to what I'm doing wrong and how I might proceed ? Thanks, Paul.
Re: Unable to match instruction pattern
Thanks Richard, That worked just as you suggested, and ... when I removed my zero_extendhisi2 the IIRC the target-independent optabs code generated exactly the same sequence. Interestingly, I noticed that changing from my define_insn "zero_extendhisi2" back to the define_expand "zero_extendhisi2" caused some optimisation opportunities to be missed where an expression calculated in 32 bits could equally well performed at the natural 16 bit word width (because the result of the calculation was truncated to 16 bits) :-) Cheers, Paul. On 12/04/14 04:21, Richard Sandiford wrote: pshor...@dataworx.com.au writes: Found it ... I had (define_expand "zero_extendhisi2" [ (set (subreg:HI (match_operand:SI 0 "general_operand" "")0) (match_operand:HI 1 "general_operand" "")) (set (subreg:HI (match_dup 0)2) (const_int 0)) ] "" "" ) FWIW, in general you shouldn't use subregs in expanders, but instead use something like: (define_expand "zero_extendhisi2" [(set (match_operand:SI 0 "general_operand" "") (zero_extend:SI (match_operand:HI 1 "general_operand" "")))] "" { emit_move_insn (gen_lowpart (HImode, operands[0]), operands[1]); emit_move_insn (gen_highpart (HImode, operands[0]), const0_rtx); DONE; }) This allows MEMs to be simplified to smaller MEMs, which is preferable to keeping them as SUBREGs. It also allows subregs of hard registers to be simplified to simple REGs and handles constants correctly. IMO SUBREGs of MEMs should go away. There's this strange convention that (subreg (mem ...)) is treated as a register_operand before reload, which is why: if( !register_operand( operands[0], mode ) && !register_operand( operands[1], mode )) { /* one of the operands must be in a register. */ operands[1] = copy_to_mode_reg (mode, operands[1]); } wouldn't trigger for a move between a constant and a subreg of a MEM. OTOH, you should be better off not defining the zero_extendhisi2 at all, since IIRC the target-independent optabs code would generate the same sequence for you. Thanks, Richard
Re: LRA Stuck in a loop until aborting
Solved... kind of. *ldsi is one of the patterns movsi is expanded to and as the name suggests it only handles register loads. I know that at some stages memory references will pass the register_operand predicate so I changed the predicate for operand 0 and added an alternative to *ldsi that could store to memory and the problem went away. What is interesting is that when LRA is disabled this case was handled fine by the old reload pass, suggesting that the new LRA pass handles this situation differently - or not at all ? BTW. Can someone tell me whether I should be top or bottom posting ? Cheers, Paul. On 16/04/14 16:38, pshor...@dataworx.com.au wrote: I've got a small test case there the ira pass produces this ... (insn 35 38 36 5 (set (reg/v:SI 29 [orig:17 _b ] [17]) (reg/v:SI 17 [ _b ])) 48 {*ldsi} (expr_list:REG_DEAD (reg/v:SI 17 [ _b ]) (nil))) and the LRA processes it as follows ... Spilling non-eliminable hard regs: 6 0 Non input pseudo reload: reject++ alt=0,overall=607,losers=1,rld_nregs=2 0 Non input pseudo reload: reject++ 1 Spill pseudo into memory: reject+=3 alt=1,overall=616,losers=2,rld_nregs=2 0 Non input pseudo reload: reject++ alt=2: Bad operand -- refuse Choosing alt 0 in insn 35: (0) =r (1) r {*ldsi} Creating newreg=34 from oldreg=29, assigning class GENERAL_REGS to r34 35: r34:SI=r17:SI REG_DEAD r17:SI Inserting insn reload after: 45: r29:SI=r34:SI 0 Non input pseudo reload: reject++ 1 Non pseudo reload: reject++ alt=0,overall=608,losers=1,rld_nregs=2 0 Non input pseudo reload: reject++ alt=1,overall=613,losers=2,rld_nregs=2 0 Non input pseudo reload: reject++ alt=2: Bad operand -- refuse Choosing alt 0 in insn 45: (0) =r (1) r {*ldsi} Creating newreg=35 from oldreg=29, assigning class GENERAL_REGS to r35 45: r35:SI=r34:SI Inserting insn reload after: 46: r29:SI=r35:SI so, it is stuck in a loop (continues on for 90 attempts then aborts) but I can't see what is causing it. The pattern (below) shouldn't require a reload so I can't see why it would be doing this (define_insn "*ldsi" [(set (match_operand:SI 0 "register_operand" "=r,r,r") (match_operand:SI 1 "general_operand" "r,m,i")) ] "" Can anyone shed any light on this behaviour ? Thanks, Paul.