match_scratch causing pattern mismatch

2015-07-30 Thread Paul Shortis
in a GCC port to a 16 bit cpu that uses CC flags for branching, 
I'm experimenting with using a 32 bit subtract for compare 
instead of multiple 16 bit compares and branches.


my cbranch4 expander produces a compare and conditional 
branch patterns...


  cmpmode = SELECT_CC_MODE( branchCode, op0, op1 );
  flags = gen_rtx_REG ( cmpmode, CC_REGNUM );

  compare = gen_rtx_COMPARE ( cmpmode, op0, op1 );
  emit_insn( gen_rtx_SET( VOIDmode, flags, compare ));

To implement compare using a subtract I need a HI mode scratch 
register, so I used a match_scratch


(define_insn "comparesi3"
  [ (set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 0 "register_operand" 
"r,r")
  (match_operand:SI 1 
"rhs_operand" "r,i")))

   (clobber(match_scratch:HI 2 "=r,r"))
  ]
  ""


When I do this, the compare no longer matches and I get failures 
like this in the vregs pass...


../../../libgcc/unwind-dw2.c:1224:1: error: unrecognizable insn:
 }
 ^
(insn 69 68 70 7 (set (reg:CC 16 flags)
(compare:CC (reg:SI 44 [ D.5851 ])
(reg:SI 169))) ../../../libgcc/unwind-dw2.c:972 -1
 (nil))

when I remove the match_scratch these errors disappear, but of 
course I don't have the scratch register needed to implement the 
proper assembler instructions


I'm aware that it's the combiner that understands clobbers etc.  
So, in the .md file I tried to add a dummy comparesi3 pattern 
that doesn't have the match_scratch... after the pattern 
containing the match_scratch. This sometimes works, however on 
occasion the dummy pattern is selected by the combiner instead of 
the match_scratch pattern .


Any insight appreciated...

Cheers, Paul



Re: match_scratch causing pattern mismatch

2015-07-30 Thread Paul Shortis

Of course, the answer is to

emit_insn( gen_comparesi3( op0, op1 ));

which generates the required match_scratch

instead of ...

  cmpmode = SELECT_CC_MODE( branchCode, op0, op1 );
  flags = gen_rtx_REG ( cmpmode, CC_REGNUM );

  compare = gen_rtx_COMPARE ( cmpmode, op0, op1 );
  emit_insn( gen_rtx_SET( VOIDmode, flags, compare ));

Sorry for the bother...

On 31/07/15 08:39, Paul Shortis wrote:
in a GCC port to a 16 bit cpu that uses CC flags for branching, I'm 
experimenting with using a 32 bit subtract for compare instead of 
multiple 16 bit compares and branches.


my cbranch4 expander produces a compare and conditional branch 
patterns...


  cmpmode = SELECT_CC_MODE( branchCode, op0, op1 );
  flags = gen_rtx_REG ( cmpmode, CC_REGNUM );

  compare = gen_rtx_COMPARE ( cmpmode, op0, op1 );
  emit_insn( gen_rtx_SET( VOIDmode, flags, compare ));

To implement compare using a subtract I need a HI mode scratch 
register, so I used a match_scratch


(define_insn "comparesi3"
  [ (set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 0 "register_operand" "r,r")
  (match_operand:SI 1 "rhs_operand" 
"r,i")))

   (clobber(match_scratch:HI 2 "=r,r"))
  ]
  ""


When I do this, the compare no longer matches and I get failures like 
this in the vregs pass...


../../../libgcc/unwind-dw2.c:1224:1: error: unrecognizable insn:
 }
 ^
(insn 69 68 70 7 (set (reg:CC 16 flags)
(compare:CC (reg:SI 44 [ D.5851 ])
(reg:SI 169))) ../../../libgcc/unwind-dw2.c:972 -1
 (nil))

when I remove the match_scratch these errors disappear, but of course 
I don't have the scratch register needed to implement the proper 
assembler instructions


I'm aware that it's the combiner that understands clobbers etc. So, in 
the .md file I tried to add a dummy comparesi3 pattern that doesn't 
have the match_scratch... after the pattern containing the 
match_scratch. This sometimes works, however on occasion the dummy 
pattern is selected by the combiner instead of the match_scratch 
pattern .


Any insight appreciated...

Cheers, Paul





Controlling instruction alternative selection

2015-07-30 Thread Paul Shortis


I'm working with a CPU having a restricted set of registers that can do 
three address maths wheres ALL registers can do two address maths.


If I define

(define_insn "addsi3"
  [ (set (match_operand:SI 0 "register_operand" "=r,r")
(plus:SI (match_operand:SI 1 "register_operand" 
"0,0")
   (match_operand:SI 2 "rhs_operand" 
"r,i")))


So that all adds are done using 2 address instructions then all is fine. 
If however I change addsi3 to


(where the constraint 'R' is the smaller set of three address registers)

(define_insn "addsi3"
  [ (set (match_operand:SI 0 "register_operand" "=R,r,r")
(plus:SI (match_operand:SI 1 "register_operand" 
"R,0,0")
   (match_operand:SI 2 "rhs_operand" 
"R,r,i")))



to take advantage of the three address instructions then the three 
address instructions are used successfully on many occassions.


However when register pressure on the 'R' class is high the allocater 
never falls back to using the entire register set by employing the two 
address instructions.


Resulting in ...

error: unable to find a register to spill in class ‘GP_REGS’

enabling lra and inspecting the rtl dump indicates that both 
alternatives (R and r) seem to be equally appealing to the allocater so 
it chooses 'R' and fails.


GCC internals document indicates that the '0' alternates should be 
placed at the end of the alternatives list, so I'm guessing 'R' will 
always be chosen.


Using constraint disparaging (?R) eradicates the errors, but of course 
that causes the 'R' three address alternative to never be used.


Suggestions ?



Re: Controlling instruction alternative selection

2015-08-20 Thread Paul Shortis

Thanks Jim,

It took me a while to get back to this problem but as you 
suggested, exposing only the three address instructions prior to 
reload... in conjunction with implementing


TARGET_CLASS_LIKELY_SPILLED_P

has produced good results. After taking these measures I didn't 
have to disparage any alternatives in the patterns.


Paul.

On 04/08/15 06:00, Jim Wilson wrote:

On 07/30/2015 09:54 PM, Paul Shortis wrote:

Resulting in ...
error: unable to find a register to spill in class ‘GP_REGS’

enabling lra and inspecting the rtl dump indicates that both
alternatives (R and r) seem to be equally appealing to the allocater so
it chooses 'R' and fails.

The problem isn't in lra, it is in reload.  You want lra to use the
three address instruction, but you then want reload to use the two
address alternative.


Using constraint disparaging (?R) eradicates the errors, but of course
that causes the 'R' three address alternative to never be used.

You want to disparage the three address alternative in reload, but not
in lra.  There is a special code for that, you can use ^ instead of ? to
make that happen.  That may or may not help though.

There is also a hook TARGET_CLASS_LIKELY_SPILLED_P which might help.
You should try defining this to return true for the 'R' class if it
doesn't already.

Jim






bug in lra causes incorrect register usage / compiler crash

2014-04-27 Thread Paul Shortis

I'm porting gcc to a 16 bit cpu with a two address ISA like x86.

When using LRA I was getting compiler crashes and results like 
this from the reload pass


(insn 97 96 98 21 (parallel [
(set (reg/f:HI 3 r3 [59])
(plus:HI (reg/f:HI 6 sp)
(const_int 4 [0x4])))
(clobber (reg:CC 7 flags))
]) ../../../libgcc/unwind-dw2-fde.c:304 25 {addhi3}
 (expr_list:REG_EQUIV (plus:HI (reg/f:HI 3 r3)
(const_int 4 [0x4]))
(nil)))

which is invalid, because like x86 the cpu only has instructions 
of the form Rn op= operand(e.g add   r3,4).


The md pattern for this instruction is...

(define_insn "hi3"
  [ (set (match_operand:HI 0 "register_operand" "=r,r,r")
(commOperator:HI (match_operand:HI 1 
"register_operand" "0,0,0")
   (match_operand:HI 2 
"rhs_operand" "r,i,m")))


looking back at the ira output I could see that the input to the 
LRA was


(insn 97 96 98 18 (parallel [
(set (reg/f:HI 59)
(plus:HI (reg/f:HI 3 r3)
(const_int 4 [0x4])))
(clobber (reg:CC 7 flags))
]) ../../../libgcc/unwind-dw2-fde.c:304 25 {addhi3}
 (expr_list:REG_UNUSED (reg:CC 7 flags)
(expr_list:REG_EQUIV (plus:HI (reg/f:HI 3 r3)
(const_int 4 [0x4]))

R3 is the frame pointer which is eliminated to sp by the LRA. 
Then r3 is allocated to virtual R59.


Tracing through LRA constraints processing I found the problem to 
be get_hard_regno (rtx x) which translates the virtual register 
59 to R3 then erroneously passes this to get_final_hard_regno() 
which eliminates R3 to sp causing constraints to be (incorrectly) 
satisfied.


At final register elimination this problem doesn't occur, as 
there is a check before elimination to confirm that a register to 
be eliminated is actually a hard register. However the 
get_hard_regno() bug does cause incorrect constraint checking 
leading to invalid RTL generation, forming an impossible instruction.


The same circumstances exist when using the LRA for the i386. I'm 
guessing it hasn't raised its head because -fomit-frame-pointer 
is off by default for the i386 port.


I don't have write access to the repository so below is a patch 
for the change I used to address this problem.


Cheers, Paul.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index aac5087..81817f2 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -197,6 +197,7 @@ get_hard_regno (rtx x)
 {
   rtx reg;
   int offset, hard_regno;
+  bool is_virtual = false;

   reg = x;
   if (GET_CODE (x) == SUBREG)
@@ -204,14 +205,18 @@ get_hard_regno (rtx x)
   if (! REG_P (reg))
 return -1;
   if ((hard_regno = REGNO (reg)) >= FIRST_PSEUDO_REGISTER)
+  {
+is_virtual = true;
 hard_regno = lra_get_regno_hard_regno (hard_regno);
+  }
   if (hard_regno < 0)
 return -1;
   offset = 0;
   if (GET_CODE (x) == SUBREG)
 offset += subreg_regno_offset (hard_regno, GET_MODE (reg),
   SUBREG_BYTE (x),  GET_MODE (x));
-  return get_final_hard_regno (hard_regno, offset);
+  return is_virtual ? hard_regno + offset
+: get_final_hard_regno (hard_regno, offset);
 }
















Re: bug in lra causes incorrect register usage / compiler crash

2014-04-29 Thread Paul Shortis
I've now confirmed this same issue occurs on a stock i386 build 
when -fomit-frame-pointer is specified with -O2 and a test case 
with reasonable register pressure.


On 28/04/14 07:47, Paul Shortis wrote:

I'm porting gcc to a 16 bit cpu with a two address ISA like x86.

When using LRA I was getting compiler crashes and results like 
this from the reload pass


(insn 97 96 98 21 (parallel [
(set (reg/f:HI 3 r3 [59])
(plus:HI (reg/f:HI 6 sp)
(const_int 4 [0x4])))
(clobber (reg:CC 7 flags))
]) ../../../libgcc/unwind-dw2-fde.c:304 25 {addhi3}
 (expr_list:REG_EQUIV (plus:HI (reg/f:HI 3 r3)
(const_int 4 [0x4]))
(nil)))

which is invalid, because like x86 the cpu only has 
instructions of the form Rn op= operand(e.g add   r3,4).


The md pattern for this instruction is...

(define_insn "hi3"
  [ (set (match_operand:HI 0 "register_operand" "=r,r,r")
(commOperator:HI (match_operand:HI 1 
"register_operand" "0,0,0")
   (match_operand:HI 2 
"rhs_operand" "r,i,m")))


looking back at the ira output I could see that the input to 
the LRA was


(insn 97 96 98 18 (parallel [
(set (reg/f:HI 59)
(plus:HI (reg/f:HI 3 r3)
(const_int 4 [0x4])))
(clobber (reg:CC 7 flags))
]) ../../../libgcc/unwind-dw2-fde.c:304 25 {addhi3}
 (expr_list:REG_UNUSED (reg:CC 7 flags)
(expr_list:REG_EQUIV (plus:HI (reg/f:HI 3 r3)
(const_int 4 [0x4]))

R3 is the frame pointer which is eliminated to sp by the LRA. 
Then r3 is allocated to virtual R59.


Tracing through LRA constraints processing I found the problem 
to be get_hard_regno (rtx x) which translates the virtual 
register 59 to R3 then erroneously passes this to 
get_final_hard_regno() which eliminates R3 to sp causing 
constraints to be (incorrectly) satisfied.


At final register elimination this problem doesn't occur, as 
there is a check before elimination to confirm that a register 
to be eliminated is actually a hard register. However the 
get_hard_regno() bug does cause incorrect constraint checking 
leading to invalid RTL generation, forming an impossible 
instruction.


The same circumstances exist when using the LRA for the i386. 
I'm guessing it hasn't raised its head because 
-fomit-frame-pointer is off by default for the i386 port.


I don't have write access to the repository so below is a patch 
for the change I used to address this problem.


Cheers, Paul.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index aac5087..81817f2 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -197,6 +197,7 @@ get_hard_regno (rtx x)
 {
   rtx reg;
   int offset, hard_regno;
+  bool is_virtual = false;

   reg = x;
   if (GET_CODE (x) == SUBREG)
@@ -204,14 +205,18 @@ get_hard_regno (rtx x)
   if (! REG_P (reg))
 return -1;
   if ((hard_regno = REGNO (reg)) >= FIRST_PSEUDO_REGISTER)
+  {
+is_virtual = true;
 hard_regno = lra_get_regno_hard_regno (hard_regno);
+  }
   if (hard_regno < 0)
 return -1;
   offset = 0;
   if (GET_CODE (x) == SUBREG)
 offset += subreg_regno_offset (hard_regno, GET_MODE (reg),
   SUBREG_BYTE (x),  GET_MODE 
(x));

-  return get_final_hard_regno (hard_regno, offset);
+  return is_virtual ? hard_regno + offset
+: get_final_hard_regno (hard_regno, offset);
 }



















Compare Elimination problems

2014-09-03 Thread Paul Shortis


For a 16 bit CPU the cmpelim pass is changing

(insn 33 84 85 6 (parallel [
(set (reg:HI 1 r1)
(ashift:HI (reg:HI 1 r1)
(const_int 1 [0x1])))
(clobber (reg:CC_NOOV 7 flags))
]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 
33 {ashlhi3}


(insn 34 87 35 6 (set (reg:CC_NOOV 7 flags)
(compare:CC_NOOV (reg:SI 0 r0)
(const_int 0 [0]))) 
../gcc/testsuite/gcc.c-torture/execute/960311-3.c:20 39 
{*comparesi3_nov}


(jump_insn 35 34 36 6 (set (pc)
(if_then_else (ge (reg:CC_NOOV 7 flags)

to

(insn 33 84 85 6 (parallel [
(set (reg:HI 1 r1)
(ashift:HI (reg:HI 1 r1)
(const_int 1 [0x1])))
(set (reg:CC_NOOV 7 flags)
(compare:CC_NOOV (ashift:HI (reg:HI 1 r1)
(const_int 1 [0x1]))
(const_int 0 [0])))
]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 
29 {ashlhi3_cc}


(jump_insn 35 87 36 6 (set (pc)
(if_then_else (ge (reg:CC_NOOV 7 flags)


(reg:HI r1) is a subreg of (reg:SI r0) however the cmpelim seems 
to be substituting the compare of (reg:HI r1 and 0) for the 
compare of (reg:SI r0 and 0) ?


While I'm here, in i386.md some of the flag setting operations 
specify a mode and some don't . Eg


(define_expand "cmp_1"
  [(set (reg:CC FLAGS_REG)
(compare:CC (match_operand:SWI48 0 "nonimmediate_operand")


(define_insn "*add_3"
  [(set (reg FLAGS_REG)
(compare

Can anyone explain the significance of this ?

Thanks, Paul.







Re: Compare Elimination problems

2014-09-04 Thread Paul Shortis

Thanks Richard,

I found the bug. try_eliminate_compare follows register 
definitions between the flags use and the clobbering compare by 
register number only, i.e. the register width isn't considered.


if (DF_REF_REGNO (def) == REGNO (in_a))
  break;

This caused R1:HI to follow R0:SI causing the compare:HI (R0, so 
be seen as as valid source of flags.


The following change fixes the problem while preserving the 
proper behaviour.


   x = single_set (insn);
   if (x == NULL)
return false;
+ if( GET_MODE(x) != GET_MODE(in_a))
+   return false;
   in_a = SET_SRC (x);

Cheers, Paul.

On 05/09/14 02:33, Richard Henderson wrote:

On 09/03/2014 03:14 PM, Paul Shortis wrote:

(insn 33 84 85 6 (parallel [
 (set (reg:HI 1 r1)
 (ashift:HI (reg:HI 1 r1)
 (const_int 1 [0x1])))
 (clobber (reg:CC_NOOV 7 flags))
 ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 33 {ashlhi3}

(insn 34 87 35 6 (set (reg:CC_NOOV 7 flags)
 (compare:CC_NOOV (reg:SI 0 r0)
 (const_int 0 [0])))
../gcc/testsuite/gcc.c-torture/execute/960311-3.c:20 39 {*comparesi3_nov}

(jump_insn 35 34 36 6 (set (pc)
 (if_then_else (ge (reg:CC_NOOV 7 flags)

to

(insn 33 84 85 6 (parallel [
 (set (reg:HI 1 r1)
 (ashift:HI (reg:HI 1 r1)
 (const_int 1 [0x1])))
 (set (reg:CC_NOOV 7 flags)
 (compare:CC_NOOV (ashift:HI (reg:HI 1 r1)
 (const_int 1 [0x1]))
 (const_int 0 [0])))
 ]) ../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 29 {ashlhi3_cc}

(jump_insn 35 87 36 6 (set (pc)
 (if_then_else (ge (reg:CC_NOOV 7 flags)


(reg:HI r1) is a subreg of (reg:SI r0) however the cmpelim seems to be
substituting the compare of (reg:HI r1 and 0) for the compare of (reg:SI r0 and
0) ?

That would appear to be a bug, though I don't immediately see where.
The relevant test would appear to be

   if (rtx_equal_p (SET_DEST (x), in_a))

which would not be true for (reg:SI) vs (reg:HI), whatever their regno's.

Anyway, put your breakpoint in try_eliminate_compare to debug this.


While I'm here, in i386.md some of the flag setting operations specify a mode
and some don't . Eg

(define_expand "cmp_1"
   [(set (reg:CC FLAGS_REG)
 (compare:CC (match_operand:SWI48 0 "nonimmediate_operand")

That's a define_expand, not a define_insn.


(define_insn "*add_3"
   [(set (reg FLAGS_REG)
 (compare

The mode checks are done in the ix86_match_ccmode call inside the predicate.


r~





Re: Compare Elimination problems

2014-09-04 Thread Paul Shortis

Correction,

...compare:CC_N... (R1:HI, 0)

On 05/09/14 09:31, Paul Shortis wrote:

Thanks Richard,

I found the bug. try_eliminate_compare follows register 
definitions between the flags use and the clobbering compare by 
register number only, i.e. the register width isn't considered.


if (DF_REF_REGNO (def) == REGNO (in_a))
  break;

This caused R1:HI to follow R0:SI causing the compare:HI (R0, 
so be seen as as valid source of flags.


The following change fixes the problem while preserving the 
proper behaviour.


   x = single_set (insn);
   if (x == NULL)
return false;
+ if( GET_MODE(x) != GET_MODE(in_a))
+   return false;
   in_a = SET_SRC (x);

Cheers, Paul.

On 05/09/14 02:33, Richard Henderson wrote:

On 09/03/2014 03:14 PM, Paul Shortis wrote:

(insn 33 84 85 6 (parallel [
 (set (reg:HI 1 r1)
 (ashift:HI (reg:HI 1 r1)
 (const_int 1 [0x1])))
 (clobber (reg:CC_NOOV 7 flags))
 ]) 
../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 33 
{ashlhi3}


(insn 34 87 35 6 (set (reg:CC_NOOV 7 flags)
 (compare:CC_NOOV (reg:SI 0 r0)
 (const_int 0 [0])))
../gcc/testsuite/gcc.c-torture/execute/960311-3.c:20 39 
{*comparesi3_nov}


(jump_insn 35 34 36 6 (set (pc)
 (if_then_else (ge (reg:CC_NOOV 7 flags)

to

(insn 33 84 85 6 (parallel [
 (set (reg:HI 1 r1)
 (ashift:HI (reg:HI 1 r1)
 (const_int 1 [0x1])))
 (set (reg:CC_NOOV 7 flags)
 (compare:CC_NOOV (ashift:HI (reg:HI 1 r1)
 (const_int 1 [0x1]))
 (const_int 0 [0])))
 ]) 
../gcc/testsuite/gcc.c-torture/execute/960311-3.c:18 29 
{ashlhi3_cc}


(jump_insn 35 87 36 6 (set (pc)
 (if_then_else (ge (reg:CC_NOOV 7 flags)


(reg:HI r1) is a subreg of (reg:SI r0) however the cmpelim 
seems to be
substituting the compare of (reg:HI r1 and 0) for the compare 
of (reg:SI r0 and

0) ?
That would appear to be a bug, though I don't immediately see 
where.

The relevant test would appear to be

   if (rtx_equal_p (SET_DEST (x), in_a))

which would not be true for (reg:SI) vs (reg:HI), whatever 
their regno's.


Anyway, put your breakpoint in try_eliminate_compare to debug 
this.


While I'm here, in i386.md some of the flag setting 
operations specify a mode

and some don't . Eg

(define_expand "cmp_1"
   [(set (reg:CC FLAGS_REG)
 (compare:CC (match_operand:SWI48 0 "nonimmediate_operand")

That's a define_expand, not a define_insn.


(define_insn "*add_3"
   [(set (reg FLAGS_REG)
 (compare
The mode checks are done in the ix86_match_ccmode call inside 
the predicate.



r~








limiting call clobbered registers for library functions

2015-01-29 Thread Paul Shortis
I've ported GCC to a small 16 bit CPU that has single bit shifts. 
So I've handled variable / multi-bit shifts using a mix of inline 
shifts and calls to assembler support functions.


The calls to the asm library functions clobber only one (by 
const) or two (variable) registers but of course calling these 
functions causes all of the standard call clobbered registers to 
be considered clobbered, thus wasting lots of candidate registers 
for use in expressions surrounding these shifts and causing 
unnecessary register saves in the surrounding function 
prologue/epilogue.


I've scrutinized and cloned the actions of other ports that do 
the same, however I'm unable to convince the various passes that 
only r1 and r2 can be clobbered by these library calls.


Is anyone able to point me in the proper direction for a solution 
to this problem ?


Thanks, Paul.





Re: limiting call clobbered registers for library functions

2015-02-02 Thread Paul Shortis

On 02/02/15 18:55, Yury Gribov wrote:

On 01/30/2015 11:16 AM, Matthew Fortune wrote:

Yury Gribov  writes:

On 01/29/2015 08:32 PM, Richard Henderson wrote:

On 01/29/2015 02:08 AM, Paul Shortis wrote:
I've ported GCC to a small 16 bit CPU that has single bit 
shifts. So
I've handled variable / multi-bit shifts using a mix of 
inline shifts

and calls to assembler support functions.

The calls to the asm library functions clobber only one (by 
const) or

two
(variable) registers but of course calling these functions 
causes all
of the standard call clobbered registers to be considered 
clobbered,
thus wasting lots of candidate registers for use in 
expressions
surrounding these shifts and causing unnecessary register 
saves in

the surrounding function prologue/epilogue.


I've scrutinized and cloned the actions of other ports that 
do the
same, however I'm unable to convince the various passes 
that only r1

and r2 can be clobbered by these library calls.

Is anyone able to point me in the proper direction for a 
solution to

this problem ?


You wind up writing a pattern that contains a call, but isn't
represented in rtl as a call.


Could it be useful to provide a pragma for specifying 
function register
usage? This would allow e.g. library writer to write a 
hand-optimized
assembly version and then inform compiler of it's binary 
interface.


Currently a surrogate of this can be achieved by putting 
inline asm code
in static inline functions in public library headers but this 
has it's

own disadvantages (e.g. code bloat).


This sounds like a good idea in principle. I seem to recall 
seeing something
similar to this in other compiler frameworks that allow a 
number of special
calling conventions to be defined and enable functions to be 
attributed to use
one of them. I.e. not quite so general as specifying an 
arbitrary clobber list

but some sensible pre-defined alternative conventions.


FYI a colleague from kernel mentioned that they already achieve 
this by wrapping the actual call with inline asm e.g.


static inline int foo(int x) {
  asm(
".global foo_core\n"
// foo_core accepts single parameter in %rax,
// returns result in %rax and
// clobbers %rbx
"call foo_core\n"
: "+a"(x)
:
: "rbx"
  );
  return x;
}

We still can't mark inline asm with things like 
__attribute__((pure)), etc. though so it's not an ideal solution.


-Y



Thanks everyone.

I've finally settled on an extension of the solution offered by 
Richard(from the SH port)


I've had to ...

1.write an expander that expands ...


a)short (bit count) constant shifts to an 
instruction pattern that emits asm for one or more inline shifts
b)other constant shifts to another instruction 
pattern that emits asm for a libary call and clobbers cc
c)variable shifts to another instruction pattern 
that emits asm for a libary call and clobbers ccplus r2


then for compare elimination two other instruction patters 
corresponding to b) and c) that set CC from a compare instead of 
clobbering it.


I could have avoided the expander and used a single instruction 
pattern for a)b)c) if if could have found a way to have 
alternative dependent clobbers in an instruction pattern. I 
investigated attributes but couldn't see how I would be able to 
achieve what I needed. Also tried clobber (match_dup 2) but when 
one of the alternatives has a constant for operands[2] the 
clobber is accepted silently by the .md compiler but doesn't 
actually clobber the non-constant alternatives.


A mechanism to implement alternative dependent clobbers would 
have allowed all this to be represented in a much more succinct 
and compact manner.


On another note... compare elimination didn't work for pattern c) 
and on inspection I found this snippet in compare-elim.c


static bool
arithmetic_flags_clobber_p (rtx insn)
{
...
...
  if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) ==2)
{

which of course rejects any parallel pattern with a main rtx and 
more than one clobber (i.e. (c) above). So I changed this 
function so that it accepts patterns where rtx's one and upwards 
are all clobbers, one being cc. The resulting generated asm ...


ld  r2,r4; r2 holds the shift 
count, it will be clobbered to calculate an index into the 
sequence of shift instructions

call__ashl_v; variable ashl
beq .L4; branch on result
ld  r1,r3

If anyone can see any fault in this change please call out

Of course, the problem with using inline asm is that you have to 
arrange for them to be included in EVERY compile and they don't 
allow compare elimination to be easily implemented.


Paul.







Re: limiting call clobbered registers for library functions

2015-02-05 Thread Paul Shortis

On 03/02/15 09:14, Joern Rennecke wrote:

On 2 February 2015 at 21:54, Paul Shortis  wrote:

I could have avoided the expander and used a single instruction pattern for
a)b)c) if if could have found a way to have alternative dependent clobbers
in an instruction pattern. I investigated attributes but couldn't see how I
would be able to achieve what I needed. Also tried clobber (match_dup 2) but
when one of the alternatives has a constant for operands[2] the clobber is
accepted silently by the .md compiler but doesn't actually clobber the
non-constant alternatives.

You can clobber one or more match_scratch with suitable constraint alternatives
and let the register allocator / reload reserve the hard registers.
Well, not really for cc, you'll just have to say you clobber it, and
split away the
unused clobber post-reload.
That will not really give you less source code, but more uniform patterns
prior to register allocation.  That gives the rtl optimizers a better
handle on the code.
The effect is even more pronounced when you replace all hard register usage with
pseudo register usage and appropriate constraints.


Thanks Joern,

That was one of the (many) things I had tried, but I couldn't get 
it to work & moved on. Your post caused to to try again, and it 
worked perfectly and I now need just a single pattern and no 
expander.


I do howeverhave one remaining issue.

The pattern I'm using is ...

(define_insn "hi3"
  [(set (match_operand:WORD 0 "register_operand" "=r,C,C")
(hiShiftOperator:WORD (match_operand:WORD 1 
"register_operand"  "0,0,0")
  (match_operand:WORD 2 
"rhs_operand" "M,i,D")))

(clobber (reg:CC_NOOV CC_REGNUM))
(clobber (match_scratch:HI 3 "=X,X,D"))
  ]
  ""
"..."
  [(set_attr "length" "2,8,8")]
)

the r/M constraint pair is for shifts small enough to perform in 
line ( M == const_int 1...3)


register constraints C and D are for the registers used for the 
library calls


At first I used the first two registers normally used for 
function calls (r1 and r2), however when building the libgcc et 
al I occasionally encounter the situation where a register in the 
R1_CLASS (constraint C) can't be allocated.


I changed these two registers to the two last GPRs in the 
allocation order and made them call clobbered and I can at least 
compile libgcc and newlib.


Could I be doing something better, or is this expected behaviour ?

Paul.


Unable to match instruction pattern

2014-04-08 Thread Paul Shortis
I'm porting gcc to a 16 bit processor. Occasionally compiling 
source such as


short v1;
long global;



global = (long)v1;

results in ...

(insn 11 10 12 2 (set (subreg:HI (mem/c:SI (symbol_ref:HI 
("global")  ) [2 global+0 S4 A16]) 2)

(const_int 0 [0])) t.c:15 -1
 (nil))
(insn 12 11 13 2 (set (subreg:HI (mem/c:SI (symbol_ref:HI 
("global")  ) [2 global+0 S4 A16]) 0)

(reg:HI 24 [ D.1348 ])) t.c:15 -1
 (nil))

The second of these instructions successfully matches a movhi 
pattern which is found via the define_expand below


;the INTEGER iterator includes [QI HI SI]
(define_expand "mov"
  [(set (match_operand:INTEGER 0 "nonimmediate_operand" "")
(match_operand:INTEGER 1 "general_operand" ""))]
  ""
  {
if( can_create_pseudo_p() )
{
if( !register_operand( operands[0], mode )
&& !register_operand( operands[1], mode ))
{
  /* one of the operands must be in a register.  */
operands[1] = copy_to_mode_reg (mode, 
operands[1]);

}
 }


However the first instruction fails to match and be expanded so 
that the const_int is forced into a register. I defined an 
instruction which is an exact match


(define_insn "*storesic"
  [
(set (subreg:HI (mem:SI (match_operand:HI 0 
"symbol_ref_operand" "")) 2)

(match_operand:HI 1 "const_int_operand"))
  ]

This matches, however I need a scratch register (R? below) to 
transform this into


loadR?, const_int_operand
store R?, address

and as soon as I add the match_scratch

(clobber(match_scratch:HI 2 "=r"))


the pattern no longer matches because expand calls recog() with 
pnum_clobbers = 0 (NULL)


I a similar vein to the way CC setting instructions are handled I 
tried defining a  second instruction containing the required 
match_scratch prior to the define_insn "*storesic" (see above) in 
the hope that it may match subsequent to the expand pass, however 
it is never seen.


Can anyone offer any insight to what I'm doing wrong and how I 
might proceed ?


Thanks, Paul.














Re: Unable to match instruction pattern

2014-04-14 Thread Paul Shortis

Thanks Richard,

That worked just as you suggested, and ... when I removed my 
zero_extendhisi2 the IIRC the target-independent optabs code 
generated exactly the same sequence.


Interestingly, I noticed that changing from my define_insn 
"zero_extendhisi2" back to the define_expand "zero_extendhisi2" 
caused some optimisation opportunities to be missed where an 
expression calculated in 32 bits could equally well performed at 
the natural 16 bit word width (because the result of the 
calculation was truncated to 16 bits) :-)


Cheers, Paul.

On 12/04/14 04:21, Richard Sandiford wrote:

pshor...@dataworx.com.au writes:

Found it ...

I had

(define_expand "zero_extendhisi2"
  [
  (set (subreg:HI (match_operand:SI 0 "general_operand" "")0)
  (match_operand:HI 1 "general_operand" ""))
  (set (subreg:HI (match_dup 0)2)
  (const_int 0))
  ]
""
""
)

FWIW, in general you shouldn't use subregs in expanders, but instead use
something like:

(define_expand "zero_extendhisi2"
   [(set (match_operand:SI 0 "general_operand" "")
 (zero_extend:SI (match_operand:HI 1 "general_operand" "")))]
   ""
{
   emit_move_insn (gen_lowpart (HImode, operands[0]), operands[1]);
   emit_move_insn (gen_highpart (HImode, operands[0]), const0_rtx);
   DONE;
})

This allows MEMs to be simplified to smaller MEMs, which is preferable
to keeping them as SUBREGs.  It also allows subregs of hard registers
to be simplified to simple REGs and handles constants correctly.

IMO SUBREGs of MEMs should go away.  There's this strange convention
that (subreg (mem ...)) is treated as a register_operand before reload,
which is why:

  if( !register_operand( operands[0], mode )
  && !register_operand( operands[1], mode ))
  {
/* one of the operands must be in a register.  */
  operands[1] = copy_to_mode_reg (mode, operands[1]);
  }

wouldn't trigger for a move between a constant and a subreg of a MEM.

OTOH, you should be better off not defining the zero_extendhisi2 at all,
since IIRC the target-independent optabs code would generate the same
sequence for you.

Thanks,
Richard





Re: LRA Stuck in a loop until aborting

2014-04-16 Thread Paul Shortis

Solved... kind of.

*ldsi is one of the patterns movsi is expanded to and as the name 
suggests it only handles register loads. I know that at some 
stages memory references will pass the register_operand predicate 
so I changed the predicate for operand 0 and added an alternative 
to *ldsi that could store to memory and the problem went away.


What is interesting is that when LRA is disabled this case was 
handled fine by the old reload pass, suggesting that the new LRA 
pass handles this situation differently - or not at all ?


BTW. Can someone tell me whether I should be top or bottom posting ?

Cheers, Paul.

On 16/04/14 16:38, pshor...@dataworx.com.au wrote:


I've got a small test case there the ira pass produces this ...

(insn 35 38 36 5 (set (reg/v:SI 29 [orig:17 _b ] [17])
(reg/v:SI 17 [ _b ])) 48 {*ldsi}
 (expr_list:REG_DEAD (reg/v:SI 17 [ _b ])
(nil)))

and the LRA processes it as follows ...

   Spilling non-eliminable hard regs: 6
0 Non input pseudo reload: reject++
  alt=0,overall=607,losers=1,rld_nregs=2
0 Non input pseudo reload: reject++
1 Spill pseudo into memory: reject+=3
  alt=1,overall=616,losers=2,rld_nregs=2
0 Non input pseudo reload: reject++
alt=2: Bad operand -- refuse
 Choosing alt 0 in insn 35:  (0) =r  (1) r {*ldsi}
  Creating newreg=34 from oldreg=29, assigning class 
GENERAL_REGS to r34

   35: r34:SI=r17:SI
  REG_DEAD r17:SI
Inserting insn reload after:
   45: r29:SI=r34:SI

0 Non input pseudo reload: reject++
1 Non pseudo reload: reject++
  alt=0,overall=608,losers=1,rld_nregs=2
0 Non input pseudo reload: reject++
  alt=1,overall=613,losers=2,rld_nregs=2
0 Non input pseudo reload: reject++
alt=2: Bad operand -- refuse
 Choosing alt 0 in insn 45:  (0) =r  (1) r {*ldsi}
  Creating newreg=35 from oldreg=29, assigning class 
GENERAL_REGS to r35

   45: r35:SI=r34:SI
Inserting insn reload after:
   46: r29:SI=r35:SI

so, it is stuck in a loop (continues on for 90 attempts then 
aborts) but I can't see what is causing it. The pattern (below) 
shouldn't require a reload so I can't see why it would be doing 
this


(define_insn "*ldsi"
  [(set (match_operand:SI 0 "register_operand" "=r,r,r")
(match_operand:SI 1 "general_operand"   "r,m,i"))
  ]
  ""

Can anyone shed any light on this behaviour ?

Thanks, Paul.