Creating assembler comments from RTL

2005-03-09 Thread HutchinsonAndy
Is there a good way of creating an assembler comments directly from RTL?

I want to be able to  add debugging/explanation strings to assembler listing 
(GAS). Unfortunately I want to do this from RTL prologue and epilogue  (and 
thus avoid using TARGET_ASM_FUNCTION_EPILOGUE - where it would be easy!)

I know I could define a "psuedo" insn that finally creates assembler strings 
BUT I believe that would interfere with RTL optimisations.

-- 
Andy Hutchinson



__
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp


[AVR] RTL prologue/epilogue

2005-03-19 Thread HutchinsonAndy
Attached is patched for 4.1.0 head that changes AVR to use RTL 
prologue/epilogue generation.

This also should also fix the c++ bug caused by the existing code using the 
function name as a label. (sorry the PR escapes me!)


It generates same code with following exceptions:

Small stack adjustments are made using "rcall ." (2 bytes) and "push r0" (1 
byte). This avoids difficult direct manipulation of stack pointer when it is 
less efficient. Choice is based on instruction size - but thats pretty close to 
speed.

All stack pointer loads are now handled by move_hi code. This removes 
duplicated tests and simplifies code.

Various unspecs have been added to generate the required abnormal RTL 
instructions and where gcc want to remove things (usually to do with register 
push/pops for r0/r1). 

Asm comments are different - limited by RTL. However they convey same 
information if not more descriptive.

...and hopefully the code's right!

Comments on accuracy and style please.


-- 
Andy Hutchinson


__
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp


patch.txt.bz2
Description: patch.txt.bz2


Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy


I am stumped and hope someone more skilled can give me some clues to 
solve this problem I have  4.3/4.4 gcc.


I have created RTL define/splits for AVR port logical operation 
(AND,IOR etc). These split larger mode operations in to QImode 
operations. I also created similar splitters for zero_extend and some 
combined zero_extended shift operations.


All of these create QImode moves of subregs and all split before reload 
(ie unconditional).


They all split as expected at first oppertunity and produce expected 
RTL. Whoppee!


When code involves zero constants (created by zero_extend, or shift) I 
can see propagation of constant into following logical operations. (Os 
or O3 optimizations). Such as:


Rx = 0
OR Ry,Rx

becoming

OR Ry,0

The propagation is fine, BUT the creation of OR Ry,0  is a totally 
redundant operation and  remains intact thru all further passes into 
final code - apparently not being removed by any optimisations after 
split1 pass!


I created RTL pattern to remove these (splitting  OR Rx,0 into NOP) and 
that removes them - but surely this workaround should not be needed. I 
am stumped by what could be causing the problem. Help!


There also seem to be cases where zero constants are not propagated 
into instructions. Yet testcase only involves simple operands, no loops 
or conditionals or any other side effects that might be reason to block 
this. Can anyone suggest some non-obvious reasons for this?


Andy






More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy

Jeff,

thanks for help - I desparately need ideas - this problem is driving me 
nuts


The RTL for IOR Rx,0 does use subregs  (since I use simplify_gen_subreg 
in splitter.)


Perhaps I should  generate new pseudo QI registers instead before 
reload?


Is there any particular function or pass that should be dealing with 
IOR rx,0 - that I could trace thru and figure out why it does not like 
it (or never gets there)?


Andy



-Original Message-
From: Jeff Law <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: gcc@gcc.gnu.org
Sent: Tue, 19 Feb 2008 2:12 pm
Subject: Re: Redundant logical operations left after early splitting


[EMAIL PROTECTED] wrote: 
> I am stumped and hope someone more skilled can give me some clues 

to > solve this problem I have 4.3/4.4 gcc. 
> I have created RTL define/splits for AVR port logical operation 
(AND,IOR > etc). These split larger mode operations in to QImode 
operations. I also > created similar splitters for zero_extend and some 
combined > zero_extended shift operations. 
> All of these create QImode moves of subregs and all split before 

reload > (ie unconditional). 
> They all split as expected at first oppertunity and produce 

expected > RTL. Whoppee! 
> When code involves zero constants (created by zero_extend, or 
shift) I > can see propagation of constant into following logical 
operations. (Os > or O3 optimizations). Such as: 

> Rx = 0 
OR Ry,Rx 
> becoming 
> OR Ry,0 
> The propagation is fine, BUT the creation of OR Ry,0 is a totally > 
redundant operation and remains intact thru all further passes into > 
final code - apparently not being removed by any optimisations after > 
split1 pass! 
> I created RTL pattern to remove these (splitting OR Rx,0 into NOP) 
and > that removes them - but surely this workaround should not be 
needed. I > am stumped by what could be causing the problem. Help! 
> There also seem to be cases where zero constants are not propagated 
into > instructions. Yet testcase only involves simple operands, no 
loops or > conditionals or any other side effects that might be reason 
to block > this. Can anyone suggest some non-obvious reasons for this? 

You'll need to look at the RTL dumps to determine why these redundant 
operations are not being removed or why some propagations aren't being 
performed. 
 
GCC certainly has code to do things like remove X IOR 0, so you just 
need to figure out why it isn't triggering. If your code had lots of 
SUBREGs, then that's definitely worth investigating -- many of GCC's 
optimizers aren't particularly adept at dealing with SUBREGs. 
 
jeff 



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy

Jeff,

thanks again for suggestions.

As I understand it (perhaps wrongly), actual splitting only occurs 
after combine pass (by split1 pass).


The combine, does, match one of my patterns (lshift:SI 
(zero_extend:QI), 8/16/24). The other splitters (zero_extend) are 
"matched" initially - since they are standard insn.


I dont think this matters, if neither is split until split1 - (or does 
combine perform a split that is hidden from the dump files?)


If your description is correct, you may have answered the question - 
combine is before split1 so there are no further optimisation perfromed 
on any split - since simplfy-rtx is never called.


I could use expand - but I know that will cause a world of hurt by 
missing optimisations at the "word" level. So new  psuedos seems my 
only way forward (at the target level)


Which optimiser/pass would benefit from handling subregs better?

best regards

Andy




-Original Message-
From: Jeff Law <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: gcc@gcc.gnu.org
Sent: Tue, 19 Feb 2008 2:47 pm
Subject: Re: Redundant logical operations left after early splitting


[EMAIL PROTECTED] wrote: 
 
The RTL for IOR Rx,0 does use subregs (since I use 

simplify_gen_subreg > in splitter.) 
> Perhaps I should generate new pseudo QI registers instead before 

reload? 
It's been a long time, but yes, you could look into creating new 
registers if you're early into in the optimization pipeline. Your 
alternative is to extend the optimizers to better handle subregs 
better. 
 
The latter is far more general and would probably help in numerous 
situations and is definitely worth a looksie to see if it can be 
done easily. 
 
> Is there any particular function or pass that should be dealing 
with IOR > rx,0 - that I could trace thru and figure out why it does 
not like it > (or never gets there)? 

I would be looking in combine and simplify-rtx (which is called by 
combine). If your splitter triggers after combine, then I'm not 
immediately sure where to look -- I'm not offhand aware of a pass 
after combine which would call into simplify-rtx to perform this 
optimization. 
 
jeff 



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy



Dave and Jeff,

(sorry if you get more than one copy of this email,  it's playing up!)

Here are more details and I have include testcase, splitter patterns and
RTL dump to show problem in more detail.


The testcase is:


unsigned long f (unsigned char  *P)

{

unsigned long C;

C  = ((unsigned long)P[1] << 24)

   | ((unsigned long)P[2] << 16)

   | ((unsigned long)P[3] <<  8)

   | ((unsigned long)P[4] <<  0);

return C;

}


which normally produce horrible code with no significant impact of
optimisation.


To solve this, back end patterns for zero_extend and the lshift by
multiples of 8 were split into QImode moves.

The hope was that gcc would then collapse the QImode expression list
such as  0|0|0|x or 0|0|y|0 into simple moves.

Here are splitter patterns (followed by more stuff):


;; xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x 
xx<---x


;; zero extend

(define_insn_and_split "zero_extendqi2"

[(set (match_operand:HIDI 0 "nonimmediate_operand" "")

 (zero_extend:HIDI (match_operand:QI 1 "nonimmediate_operand" 
"")))]


""

"#"

""

[(const_int 0)]

"

int i;

  enum machine_mode dmode = GET_MODE (operands[0]);

  int dsize = GET_MODE_SIZE (dmode);

  enum machine_mode smode = GET_MODE (operands[1]);

  int ssize = GET_MODE_SIZE (smode);

  rtx dword =  simplify_gen_subreg (smode, operands[0], dmode, 0);

  emit_move_insn (dword, operands[1]);

  for (i = ssize; i < dsize; i++)

  {

 rtx dbyte =  simplify_gen_subreg (QImode, operands[0], dmode, 
i);


  emit_move_insn (dbyte, const0_rtx);

  }

  DONE;

")


;byte shift is just a series of moves

;check src OR dest is in register, so the move will be ok


(define_insn_and_split "ashl3_const2p"

[(set (match_operand:HIDI 0 "nonimmediate_operand""")


  (ashift:HIDI (match_operand:HIDI 1 "register_operand"  "")

 (match_operand:HIDI 2 "const_int_operand" "i"))

)

  ]

"((INTVAL (operands[2]) % 8) == 0)

 "

"#"

""

[(const_int 0)]

{

  int i;

  enum machine_mode mode;

  mode = GET_MODE(operands[0]);

  int size = GET_MODE_SIZE(mode);

  HOST_WIDE_INT x = INTVAL (operands[2]);
  rtx dbytes[8], sbytes[8];

  for (i = 0; i < size; i++)

  {

  dbytes[i] =  simplify_gen_subreg (QImode, operands[0], mode, i);

  sbytes[i] =  simplify_gen_subreg (QImode, operands[1], mode, i);

  }

  int shift = x / 8;

  if (shift > size) shift = size;

  for (i = shift; i < size; i++)

  {

  emit_move_insn (dbytes[i], sbytes[i - shift]);
  }

  for (i = 0; i < shift; i++)

  {

  emit_move_insn (dbytes[i], const0_rtx);
  }

}

)


;


The result kinda works but we are left with OR x,0 (and some missed
opportunities to propagate zero constant forward into OR)


The spliiters are matched up initially (zero_extend) or at combine -
just as expected.


All the subregs appear as expected in split1. Naturally this produces a
bunch of QI subregs many of which  contain zero. No real change happens
in RTL  until local register allocation (lreg dump file). There are no
redundant  IOR Rm,0 in dump files before lreg pass. The only note  is a
reg dead on the pointer argument when it gets moved to a pointer
register. (no reg equals or other dead notes until lreg pass)


In the lreg dump file  I can see  the propagation of  many (but not all)
constant 0 forward into the IOR instructions (eg Rn = 0, Rm= Rm | Rn
=>  Rm = Rm|0).  These remains in RTL and are output into final code.
Loads  of zero into registers which end up being unused are removed in
latter passes.


I can remove IOR Rm,0 with a targetted splitter to create a NOP - which
is my last resort.


So here is lreg dump extract:



;; Function f (f)


starting the processing of deferred insns

ending the processing of deferred insns

df_analyze called

df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (
1)


df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (
1)




Pass 0


Register 42 costs: POINTER_X_REGS:0 POINTER_Y_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:8000 SIMPLE_LD_REGS:8000
LD_REGS:8000 NO_LD_REGS:8000 GENERAL_REGS:8000 ALL_REGS:1 MEM:2

Register 58 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000

Register 59 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000

Register 60 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000

Register 61 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0
LD_REGS:0 NO_LD_REGS:0 GENERAL_REGS:0 ALL_REGS:16000 MEM:16000

Register 62 costs: 

Re: Redundant logical operations left after early splitting

2008-02-19 Thread hutchinsonandy

RTL dumps for Combine pass and ASMCONS (last one before local-alloc)


COMBINE


;; Function f (f)



starting the processing of deferred insns

ending the processing of deferred insns

df_analyze called

insn_cost 2: 4

insn_cost 6: 16

insn_cost 7: 12

insn_cost 8: 16

insn_cost 9: 16

insn_cost 10: 16

insn_cost 11: 16

insn_cost 12: 16

insn_cost 13: 16

insn_cost 14: 16

insn_cost 15: 16

insn_cost 35: 4

insn_cost 36: 4

insn_cost 37: 4

insn_cost 38: 4

insn_cost 26: 0

Failed to match this instruction:

(parallel [

   (set (reg:SI 44)

   (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ])

   (const_int 1 [0x1])) [0 S1 A8])))

   (set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

   ])

Failed to match this instruction:

(parallel [

   (set (reg:SI 44)

   (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ])

   (const_int 1 [0x1])) [0 S1 A8])))

   (set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

   ])

Failed to match this instruction:

(set (reg:SI 45)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18])))

Failed to match this instruction:

(parallel [

   (set (reg:SI 45)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24
[ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18])))

   (set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

   ])

Failed to match this instruction:

(parallel [

   (set (reg:SI 45)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24
[ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18])))

   (set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

   ])

Successfully matched this instruction:

(set (reg/v/f:HI 42 [ P ])

   (reg:HI 24 r24 [ P ]))

Failed to match this instruction:

(set (reg:SI 45)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18])))

Failed to match this instruction:

(set (reg:SI 47)

   (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 2 [0x2])) [0 S1 A8]))

   (const_int 16 [0x10])))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 44)

   (const_int 24 [0x18]))

   (reg:SI 47)))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 46)

   (const_int 16 [0x10]))

   (reg:SI 45)))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42
[ P ])

   (const_int 1 [0x1])) [0 S1 A8]))

   (const_int 24 [0x18]))

   (reg:SI 47)))

Successfully matched this instruction:

(set (reg:SI 45)

   (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 1 [0x1])) [0 S1 A8])))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 45)

   (const_int 24 [0x18]))

   (reg:SI 47)))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42
[ P ])

   (const_int 2 [0x2])) [0 S1 A8]))

   (const_int 16 [0x10]))

   (reg:SI 45)))

Successfully matched this instruction:

(set (reg:SI 47)

   (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 2 [0x2])) [0 S1 A8])))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 47)

   (const_int 16 [0x10]))

   (reg:SI 45)))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 46)

   (const_int 16 [0x10]))

   (ashift:SI (reg:SI 44)

   (const_int 24 [0x18]

Successfully matched this instruction:

(set (reg:SI 47)

   (ashift:SI (reg:SI 44)

   (const_int 24 [0x18])))

Failed to match this instruction:

(set (reg:SI 48)

   (ior:SI (ashift:SI (reg:SI 46)

   (const_int 16 [0x10]))

   (reg:SI 47)))

Failed to match this instruction:

(set (reg:SI 50)

   (ior:SI (ior:SI (reg:SI 45)

   (reg:SI 47))

   (reg:SI 49)))

Failed to match this instruction:

(set (reg:SI 50)

   (ior:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ])

   (const_int 4 [0x4])) [0 S1 A8]))

   (reg:SI 48)))

Failed to match this instruction:

(set (reg:SI 50)

   (ior:SI (ior:SI (ashift:SI (reg:SI 44)

   (const_int 24 [0x18]))

   (reg:SI 47))

   (reg:SI 49)))

Successfully matched this instruction:

(set (reg:SI 48)

   (ashift:SI (reg:SI 44)

   (const_int 24 [0x18])))

Failed to match this instruction:

(set (reg:SI 50)

   (ior:SI (ior:SI (reg:SI 48)

   (reg:SI 47

Re: Redundant logical operations left after early splitting

2008-02-20 Thread hutchinsonandy
I still have to try extra fwprop pass to confirm this solves the issue. 



But it does seem that the missing piece is effective constant 
propagation + simplification after splits and subregs - which is 
currently ineffective in local-alloc (or any later pass)


Adding extra pass indeed seems a poor route - BUT this would suggest no 
need for any constant propagation in local-alloc. Thus avoid 
duplication of function.



As for workaround, Nops are nothing already. I just split the 
instruction into (const_in 0) and it's gone. The split instruction 
pattern explicitely match on constant. Peephole would do the same thing 
- but latter Of course I will also have to add all the other variants 
that can be created:


AND Rx,0

AND Rx,0xff etc

Leaving only the limited constant propagation issue.

Andy

More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-20 Thread hutchinsonandy

Yes - the other approach is to lower at RTL expand.

Unfortunately there is no practical way to lower arithmetic and compare 
 operations. reload can add arithmetic which messes the "carry" 
handling attempts. So we end up with mixed higher level and lower level 
RTL.


This mixture then causes other miss optimisations - which historically 
have been very significant but grow less with time.


The splitting after combine allows existing optimisations to be kept 
largely intact.


The ports that benefit would be any that split operations - after 
expand (eg combine onwards). How much - who knows!


Unfortunately there is no way a target can hook into pass management to 
make this more specfic.


but I'm still thinking about it!

Andy



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-20 Thread hutchinsonandy

Splitters on MovMM are problematic.

One problem is that it creates issues with base pointers. AVR base 
pointer are limited to 64 byte offsets - after that they are inline 
additions (and perhaps subtractions). So splitting such a move in a 
large mode is a world of hurt. A future endeavour perhaps may solve 
this - but for now, it's on the "too difficult pile"


I dont think it is relevant here. The missing optimisation of the 
testcase is more closely related to the arithmetic than to the moves. 
The output move is already "split" by subreg lowering. After that there 
are not any pure large mode moves of significance left. Other (non 
move) splitters can create moves - but these are small and will not be 
split further.


Andy



-Original Message-
From: Dave Korn <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: gcc@gcc.gnu.org
Sent: Wed, 20 Feb 2008 2:15 pm
Subject: RE: Redundant logical operations left after early splitting



On 20 February 2008 18:05, [EMAIL PROTECTED] wrote:



But it does seem that the missing piece is effective constant
propagation + simplification after splits and subregs - which is
currently ineffective in local-alloc (or any later pass)


  Hmm, more questions:  do you use splitters or insns for your movMM 
patterns?
Do you implement all the movMM patterns?  Just checking if we've 
covered every

angle ...


   cheers,
 DaveK
--
Can't think of a witty .sigline today




More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-20 Thread hutchinsonandy
fwprop is a good suggestion. If that also simplifes substitutions, it 
may be the magic bullet to collapse all of the code. I will try latter 
tonite.



-Original Message-
From: Paolo Bonzini <[EMAIL PROTECTED]>
To: Andy H <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; gcc@gcc.gnu.org
Sent: Wed, 20 Feb 2008 3:26 am
Subject: Re: Redundant logical operations left after early splitting


Propagating REG_EQUIV notes across register-register moves would seem 
to a obviously simple way to fix this. Thoughts? 
I am not sure local-alloc is the best place to address the overall > 

problem, I doubt it is intended to provide such optimizations. 

An additional cse pass after split would seem a better way perhaps? 

 
You could try adding another fwprop run after splitting. I wouldn't 
like a third instance of the pass, but it could help. 

 
Paolo 



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-21 Thread hutchinsonandy

Paolo

I placed extra fwprop before local-alloc as this was just  before NOP 
got created and after splitting.


Putting fwprop after combine is no problem - but is too early - none of 
the patterns would be split at that time - preventing byte level 
propagations.


As register usage as well as code size is clearly relevant here, 
effective propagation before reload is the most desirable.


(PS I was impressed by fwprop code, I actually stand a chance of 
understanding some of it)


best regards




-Original Message-
From: Paolo Bonzini <[EMAIL PROTECTED]>
To: Andy H <[EMAIL PROTECTED]>; GCC Development 
Sent: Thu, 21 Feb 2008 1:06 am
Subject: Re: Redundant logical operations left after early splitting


This would indicate that simplify-rtx inside fwprop is removing OR 

Rx,0 
but not picking up the the additionally revealed forward propagation 
oppertunities 

This would seem to be an avoidable limitation. 

 
Yes, can you send me your MD patch and a simple testcase? fwprop is 
supposed to be "cascading", and some bugs in cascading were already 
revealed by the AVR port. 

 
It might be even more worthwhile to try *moving* fwprop2 after combine, 
then. 

 
Paolo 



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-21 Thread hutchinsonandy

If I understand correctly:

Prop. of "0" causes simplfy-rtx to create NOP from OR Rx,0
This NOP (deletion?) creates another set of potential uses -  as now 
the prior RHS def now passes straight thru to a new set of uses  - but 
we miss those new uses. (which in the testcase are often 0)


I will try fwprop in a few different spots latter and see what, if any 
changes occur.


thanks again

Andy




-Original Message-
From: Paolo Bonzini <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: gcc@gcc.gnu.org
Sent: Thu, 21 Feb 2008 11:04 am
Subject: Re: Redundant logical operations left after early splitting


Putting fwprop after combine is no problem - but is too early - none 
of > the patterns would be split at that time - preventing byte level > 
propagations. 

 
Yeah, I meant "after split" actually. 
 
Anyway, the problem is that if the RHS becomes a constant, fwprop does 
not propagate the LHS anymore. I can try to add some def->use 
propagation. 

 
(PS I was impressed by fwprop code, I actually stand a chance of > 

understanding some of it) 
 
Thanks (since I and Steven Bosscher wrote it) -- but I guess that's 
what you can expect from optimization passes that can rely on a decent 
framework (simplify-rtx.c and df-*.c). 

 
Paolo 



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Redundant logical operations left after early splitting

2008-02-25 Thread hutchinsonandy

Jeff,

I take your point and will go back over my splitters as I know one 
instance in which a splitter can create such an issue.

(I should checlk for a constant operand after  simplifying an  operand).

In the cases I looked at before this was not the situation.

Each splitter - or subreg lowering potentially creates (or rather 
reveals) another cascade oppertunity - but current pass arrangement 
provide no means to remove them (other than the trivial propagation in 
local-alloc). That seems fundementally wrong.


So all targets that use split or have lowered subregs potentially have 
excess register usage  and instructions (how much - I dont know)


I dont follow how taking  another psuedo inside split helps - that 
still seems to be another cascade that local-alloc will not remove.


As a side issue, if the the current  fwprop bug  - which often only 
propagates one operand, can be fixed, then I may be in better shape 
(since the early pass may then remove any cascades remaining  before it 
hits splitters)


I am looking at my original problem again and trying to find some level 
of earlier optimisation I can use (ie some form of earlier expansion or 
combination). If I can remove some of the cascaded levels before split, 
then things get much better!

But this is not easy.

thank you for your advise







-Original Message-
From: Jeff Law <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; gcc@gcc.gnu.org
Sent: Mon, 25 Feb 2008 4:34 am
Subject: Re: Redundant logical operations left after early splitting


[EMAIL PROTECTED] wrote: 

If I understand correctly: 
> Prop. of "0" causes simplfy-rtx to create NOP from OR Rx,0 
This NOP (deletion?) creates another set of potential uses - as now 
the > prior RHS def now passes straight thru to a new set of uses - but 
we > miss those new uses. (which in the testcase are often 0) 
> I will try fwprop in a few different spots latter and see what, if 

any > changes occur. 
Classic cascading. 
 
I think some of this is an artifact of how your splitters are working. 
You end up creating code which looks like 
 
(set (reg1) (const_int 0) 
(set (reg1) (ior (reg1) (other_reg) 
 
What's important here is that reg1 is being set multiple times. You'd 
be better off if you can twiddle the splitters to avoid this behavior. 
If you need a new pseudo, then get one :-) 
 
Once you do that, local would propagate these things better. That 
still leaves the simplification & nop problem, but I'm pretty sure 
that can be trivially fixed within local without resorting to running 
another forwprop pass after splitting. 
 
Jeff 



More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com


Re: Combine repeats matching on insn pairs and will ICE on 3.

2008-03-10 Thread hutchinsonandy
The deed is done!

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35519

I added a patch but it needs more expertise than I have.



-Original Message-
From: Ian Lance Taylor <[EMAIL PROTECTED]>
To: Andy H <[EMAIL PROTECTED]>
Cc: GCC Development 
Sent: Mon, 10 Mar 2008 1:32 pm
Subject: Re: Combine repeats matching on insn pairs and will ICE on 3.



Andy H <[EMAIL PROTECTED]> writes:

> I have problem with data flow and combine that is causing ICE with
> experimental build. Despite all efforts to blame my own target
> changes,
> I have reached the conclusion that this is a gcc COMBINE bug, but seek
> your advice before filing a bug report.

Your analysis looks right to me.  Please do file a bug report.
Thanks.

Ian



Re: Problem with reloading in a new backend...

2008-04-10 Thread hutchinsonandy

I noticed


Stack register is missing from ALL_REGS.

Are registers 16bit? Is just one required for pointer?



Andy




Re: Pointers to add doulle-float capabilities for the avr target

2008-05-21 Thread hutchinsonandy

That looks correct.


Be aware that support for handling  long long operands (64 bits) - or 
doubles is not well supported on AVR.  So you may expose some problems 
with this.


We currently have long long disabled in testsuite testing to avoid 
noise.  At some point I hope to switch it back on to see where we are - 
but I have enough to deal with <64 bits a present.



Andy



Where is setup for "goto" in nested function created?

2008-05-22 Thread hutchinsonandy
During  the process of fixing setjmp for AVR target, I needed to define 
targetm.builtin_setjmp_frame_value () to be used in

expand_builtin_setjmp_setup().

This sets the value of the Frame pointer stored in jump buffer.

I set this "value" to virtual_stack_vars_rtx+1 (==frame_pointer)
Receiver defined in target latter restores frame pointer using 
virtual_stack_vars_rtx = value - 1


This produce correct code as expected and avoids run-time add/sub of 
offsets. (setjmp works!)


However, for a normal goto   used inside a nested function, a different 
part of gcc creates the code to store frame pointer  (not 
expand_builtin_setjmp_setup).   I can't find this code.


The issue I have is that this "goto_setup" code does NOT use 
targetm.builtin_setjmp_frame_value - but seems to use 
value=virtual_stack_vars_rtx, which is incompatible with my target 
receiver.


So where is the goto setup code created? And is there a bug here?

regards
Andy



Re: Where is setup for "goto" in nested function created?

2008-05-22 Thread hutchinsonandy

I already have nonlocal_goto, and  nonlocal_goto_receiver.

These expect saved frame pointer to be virtual_stack_vars_rtx+1.

For setjmp the value is determined by 
targetm.builtin_setjmp_frame_value - which I defined and is correct.


But for goto the value is created by some other code - which appears to 
be wrong but I can't find!





-Original Message-
From: David Daney <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: gcc@gcc.gnu.org
Sent: Thu, 22 May 2008 1:16 pm
Subject: Re: Where is setup for "goto" in nested function created?


[EMAIL PROTECTED] wrote: 
During the process of fixing setjmp for AVR target, I needed to 

define > targetm.builtin_setjmp_frame_value () to be used in 

expand_builtin_setjmp_setup(). 
> This sets the value of the Frame pointer stored in jump buffer. 
> I set this "value" to virtual_stack_vars_rtx+1 (==frame_pointer) 
Receiver defined in target latter restores frame pointer using > 

virtual_stack_vars_rtx = value - 1 
> This produce correct code as expected and avoids run-time add/sub 

of > offsets. (setjmp works!) 
> However, for a normal goto used inside a nested function, a 
different > part of gcc creates the code to store frame pointer (not > 
expand_builtin_setjmp_setup). I can't find this code. 
> The issue I have is that this "goto_setup" code does NOT use > 
targetm.builtin_setjmp_frame_value - but seems to use > 
value=virtual_stack_vars_rtx, which is incompatible with my target > 
receiver. 

> So where is the goto setup code created? And is there a bug here? 
 
Perhaps you need to implement one or more of: save_stack_nonlocal, 
restore_stack_nonlocal, nonlocal_goto, and/or nonlocal_goto_receiver. 

 
David Daney 



Re: Where is setup for "goto" in nested function created?

2008-05-22 Thread hutchinsonandy

I think my last email crossed your reply - so apolgies for restating:

expand_builtin_nonlocal_goto is fine. This perform stack restore, 
extracts frame pointer value and does jump.


reciever is fine - this jump destination does restore of frame pointer.

The problem I have is with frame pointer value that is saved in by 
"setup" prior to all this


For goto is does not use expand_builtin_setjmp_setup - (pathetically) I 
can't find what it is using.


Andy



-Original Message-
From: Ian Lance Taylor <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: gcc@gcc.gnu.org
Sent: Thu, 22 May 2008 2:26 pm
Subject: Re: Where is setup for "goto" in nested function created?



[EMAIL PROTECTED] writes:


However, for a normal goto   used inside a nested function, a
different part of gcc creates the code to store frame pointer  (not
expand_builtin_setjmp_setup).   I can't find this code.


I think you are looking for expand_builtin_nonlocal_goto in
builtins.c.

Ian



Re: Help with reload and naked constant sum causing ICE

2008-05-28 Thread hutchinsonandy

Richard,

I appreciate the extra input.

I agree with what you say. The  target should not be  doing middle-end 
stuff .


The inc/dec and (Rxx) != (frame pointer)  parts just reload using 
pointer class
which is a one extra register than base pointers but the extra reg 
cannot take offset.


However, I dont know what reload can - or cannot do - (but I'm learning 
that every day)
I do not know  fully what was intended by all of AVR  L_R_A and how 
this might now be redundant, useful or wrong.


I already have patch with maintainer that takes out a chunk of this to 
fix another (spill) bug.


I have copied Anatoly for comment, and I promise to revisit this again 
after reviewing reload capabilities.



Andy


-Original Message-
From: Richard Sandiford <[EMAIL PROTECTED]>
To: Andy H <[EMAIL PROTECTED]>
Cc: GCC Development ; [EMAIL PROTECTED]
Sent: Wed, 28 May 2008 3:00 pm
Subject: Re: Help with reload and naked constant sum causing ICE



Andy H <[EMAIL PROTECTED]> writes:

If L_R_A does nothing with it,
the normal reload handling will first try:

  (const:HI (plus:HI (symbol_ref:HI ("chk_fail_buf") (const_int 2



This worked just as your described after I added test of
reg_equiv_constant[] inside L_R_A .

So I guess that looks like  the fix for bug I posted.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34641

To summarize

LEGITIMIZE_RELOAD_ADDRESS should now always check reg_equiv_constant
before it trying to do any push_reload of register.


TBH, I still think AVR is doing far too much in L_R_A.  To quote
the current version:

#define LEGITIMIZE_RELOAD_ADDRESS(X, MODE, OPNUM, TYPE, IND_LEVELS, 
WIN)\

do {\
 if (1&&(GET_CODE (X) == POST_INC || GET_CODE (X) == PRE_DEC)) \
   {   \
  push_reload (XEXP (X,0), XEXP (X,0), &XEXP (X,0), &XEXP (X,0),
   \

  POINTER_REGS, GET_MODE (X),GET_MODE (X) , 0, 0,  \
  OPNUM, RELOAD_OTHER);\
 goto WIN; \
   }   \

Why does AVR need different POST_INC and PRE_DEC handling from
other auto-inc targets?  This at least deserves a comment.

But really, you should just let reload handle this case.
Does reload currently lack some support you need?

 if (GET_CODE (X) == PLUS  \
 && REG_P (XEXP (X, 0))\
 && GET_CODE (XEXP (X, 1)) == CONST_INT\
 && INTVAL (XEXP (X, 1)) >= 1) \
   {   \
  int fit = INTVAL (XEXP (X, 1)) <= (64 - GET_MODE_SIZE (MODE));
   \

 if (fit)  \
   {   \
 if (reg_equiv_address[REGNO (XEXP (X, 0))] != 0)  \
   {   \
 int regno = REGNO (XEXP (X, 0));  \
 rtx mem = make_memloc (X, regno); \
  push_reload (XEXP (mem,0), NULL, &XEXP (mem,0), NULL, 
\

  POINTER_REGS, Pmode, VOIDmode, 0, 0, \
  1, ADDR_TYPE (TYPE));\
 push_reload (mem, NULL_RTX, &XEXP (X, 0), NULL,   \
  BASE_POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \
  OPNUM, TYPE);\
 goto WIN; \
   }   \
 push_reload (XEXP (X, 0), NULL_RTX, &XEXP (X, 0), NULL,   \
  BASE_POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \
  OPNUM, TYPE);\
 goto WIN; \
   }   \
  else if (! (frame_pointer_needed && XEXP (X,0) == 
frame_pointer_rtx)) \

   {   \
 push_reload (X, NULL_RTX, &X, NULL,   \
  POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0,  \
  OPNUM, TYPE);\
 goto WIN; \
   }   \
   }   \
} while(0)

These too don't make much immediate sense to me.  What are they trying
to do that reload wouldn't do otherwise?

Rather than duplicating more of reload's checks in this macro, I think
it would be better to strip it down as far as possible.

There's a bit of self-interest here.  It makes it very hard to work on
reload if ports try to duplicate so much of the logic in target-specific
files.

Richard



Re: Help with reload and naked constant sum causing ICE

2008-05-29 Thread hutchinsonandy

Again thank you and Denis for your comment.

Here is what I deduce from code and Denis comments - I am sure he (and 
others) will correct me if wrong :-)



The main issue is that we have one pointer register that cannot take 
offset and two base pointers  with limited offset (0-63).


 Reload provides no means to pick the right pointer register. To do so 
would require a "preffered base pointer" mechanism.


AVR  L_R_A sole purpose is  substituting POINTER class where there are 
oppertunities to do so.


BUT as  L_R_A is in  middle of find_reloads_address, we have a problem 
in that some normal reload solutions have already been applied and some 
have not.


This may mean  that 1) We never get here - so miss POINTER oppertunity 
or 2) Miss even better solutions that are applied latter.

3)Have code vulnerable to reload changes.


I have added comment below to explain the parts.
I'll probably need to add some debug hooks to check what parts still 
contribute.


best regards

Andy



#define LEGITIMIZE_RELOAD_ADDRESS(X, MODE, OPNUM, TYPE, IND_LEVELS, 
WIN)\

do {\
 if (1&&(GET_CODE (X) == POST_INC || GET_CODE (X) == PRE_DEC)) \
   {   \
  push_reload (XEXP (X,0), XEXP (X,0), &XEXP (X,0), &XEXP (X,0),
   \

  POINTER_REGS, GET_MODE (X),GET_MODE (X) , 0, 0,  \
  OPNUM, RELOAD_OTHER);\
 goto WIN; \
   }   \


The intent is to use POINTER class registers for INC/DEC so we have one 
more register available than BASE_POINTER


Has  INC/DEC been handled already making this part redundant?
Are there any other solutions that this code skips?



 if (GET_CODE (X) == PLUS  \
 && REG_P (XEXP (X, 0))\
 && GET_CODE (XEXP (X, 1)) == CONST_INT\
 && INTVAL (XEXP (X, 1)) >= 1) \
   {   \
  int fit = INTVAL (XEXP (X, 1)) <= (64 - GET_MODE_SIZE (MODE));
   \

 if (fit)  \
   {   \
 if (reg_equiv_address[REGNO (XEXP (X, 0))] != 0)  \
   {   \
 int regno = REGNO (XEXP (X, 0));  \
 rtx mem = make_memloc (X, regno); \
  push_reload (XEXP (mem,0), NULL, &XEXP (mem,0), NULL, 
\

  POINTER_REGS, Pmode, VOIDmode, 0, 0, \
  1, ADDR_TYPE (TYPE));\
 push_reload (mem, NULL_RTX, &XEXP (X, 0), NULL,   \
  BASE_POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \
  OPNUM, TYPE);\
 goto WIN; \
   }

Im struggling with this part. however, the POINTER class is an 
advantageous in the first reloading the address from memory
- where we would not want to waste BASE POINTER and also create overlap 
problem.


The implication of using this code is that without it we cannot  catch 
the inner reload POINTER oppertunity  if we leave reload to decompose  
BASE+offset. I think this is correct :-(



  \
 push_reload (XEXP (X, 0), NULL_RTX, &XEXP (X, 0), NULL,   \
  BASE_POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \
  OPNUM, TYPE);\
 goto WIN;

This part is duplicating reload and should not be here (I have patch 
pending with maintainer that removes this - for different bug)


 \
   }   \
  else if (! (frame_pointer_needed && XEXP (X,0) == 
frame_pointer_rtx)) \

   {   \
 push_reload (X, NULL_RTX, &X, NULL,   \
  POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0,  \
  OPNUM, TYPE);\
 goto WIN; \
   }   \
   }   \

This case is where offset is out of range - so we allow POINTER class 
on the basis that this is no worse than BASE POINTER. However, we leave 
frame_pointer to reload.





} while(0)








-Original Message-
From: Denis Chertykov <[EMAIL PROTECTED]>
To: Jeff Law <[EMAIL PROTECTED]>
Cc: Andy H <[EMAIL PROTECTED]>; GCC Development ; 
[EMAIL PROTECTED]

Sent: Thu, 29 May 2008 3:23 am
Subject: Re: Help with reload and naked constant sum causing ICE



2008/5/29 Jeff Law <[EMAIL PROTECTED]>:

Richard Sandiford wrote:


Andy H <[EMAIL PROTECTED]> writes:


If L_R_A does nothing with it,
the normal reload handling will first try:

 (const:HI (plus:HI (symbol_ref:HI ("chk_fail_buf") (const_int 

2




This worked just as your described after I added test of
reg_equiv_constant[] inside L_R_A .

So I guess t