Creating assembler comments from RTL
Is there a good way of creating an assembler comments directly from RTL? I want to be able to add debugging/explanation strings to assembler listing (GAS). Unfortunately I want to do this from RTL prologue and epilogue (and thus avoid using TARGET_ASM_FUNCTION_EPILOGUE - where it would be easy!) I know I could define a "psuedo" insn that finally creates assembler strings BUT I believe that would interfere with RTL optimisations. -- Andy Hutchinson __ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp
[AVR] RTL prologue/epilogue
Attached is patched for 4.1.0 head that changes AVR to use RTL prologue/epilogue generation. This also should also fix the c++ bug caused by the existing code using the function name as a label. (sorry the PR escapes me!) It generates same code with following exceptions: Small stack adjustments are made using "rcall ." (2 bytes) and "push r0" (1 byte). This avoids difficult direct manipulation of stack pointer when it is less efficient. Choice is based on instruction size - but thats pretty close to speed. All stack pointer loads are now handled by move_hi code. This removes duplicated tests and simplifies code. Various unspecs have been added to generate the required abnormal RTL instructions and where gcc want to remove things (usually to do with register push/pops for r0/r1). Asm comments are different - limited by RTL. However they convey same information if not more descriptive. ...and hopefully the code's right! Comments on accuracy and style please. -- Andy Hutchinson __ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp patch.txt.bz2 Description: patch.txt.bz2
Redundant logical operations left after early splitting
I am stumped and hope someone more skilled can give me some clues to solve this problem I have 4.3/4.4 gcc. I have created RTL define/splits for AVR port logical operation (AND,IOR etc). These split larger mode operations in to QImode operations. I also created similar splitters for zero_extend and some combined zero_extended shift operations. All of these create QImode moves of subregs and all split before reload (ie unconditional). They all split as expected at first oppertunity and produce expected RTL. Whoppee! When code involves zero constants (created by zero_extend, or shift) I can see propagation of constant into following logical operations. (Os or O3 optimizations). Such as: Rx = 0 OR Ry,Rx becoming OR Ry,0 The propagation is fine, BUT the creation of OR Ry,0 is a totally redundant operation and remains intact thru all further passes into final code - apparently not being removed by any optimisations after split1 pass! I created RTL pattern to remove these (splitting OR Rx,0 into NOP) and that removes them - but surely this workaround should not be needed. I am stumped by what could be causing the problem. Help! There also seem to be cases where zero constants are not propagated into instructions. Yet testcase only involves simple operands, no loops or conditionals or any other side effects that might be reason to block this. Can anyone suggest some non-obvious reasons for this? Andy More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
Jeff, thanks for help - I desparately need ideas - this problem is driving me nuts The RTL for IOR Rx,0 does use subregs (since I use simplify_gen_subreg in splitter.) Perhaps I should generate new pseudo QI registers instead before reload? Is there any particular function or pass that should be dealing with IOR rx,0 - that I could trace thru and figure out why it does not like it (or never gets there)? Andy -Original Message- From: Jeff Law <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: gcc@gcc.gnu.org Sent: Tue, 19 Feb 2008 2:12 pm Subject: Re: Redundant logical operations left after early splitting [EMAIL PROTECTED] wrote: > I am stumped and hope someone more skilled can give me some clues to > solve this problem I have 4.3/4.4 gcc. > I have created RTL define/splits for AVR port logical operation (AND,IOR > etc). These split larger mode operations in to QImode operations. I also > created similar splitters for zero_extend and some combined > zero_extended shift operations. > All of these create QImode moves of subregs and all split before reload > (ie unconditional). > They all split as expected at first oppertunity and produce expected > RTL. Whoppee! > When code involves zero constants (created by zero_extend, or shift) I > can see propagation of constant into following logical operations. (Os > or O3 optimizations). Such as: > Rx = 0 OR Ry,Rx > becoming > OR Ry,0 > The propagation is fine, BUT the creation of OR Ry,0 is a totally > redundant operation and remains intact thru all further passes into > final code - apparently not being removed by any optimisations after > split1 pass! > I created RTL pattern to remove these (splitting OR Rx,0 into NOP) and > that removes them - but surely this workaround should not be needed. I > am stumped by what could be causing the problem. Help! > There also seem to be cases where zero constants are not propagated into > instructions. Yet testcase only involves simple operands, no loops or > conditionals or any other side effects that might be reason to block > this. Can anyone suggest some non-obvious reasons for this? You'll need to look at the RTL dumps to determine why these redundant operations are not being removed or why some propagations aren't being performed. GCC certainly has code to do things like remove X IOR 0, so you just need to figure out why it isn't triggering. If your code had lots of SUBREGs, then that's definitely worth investigating -- many of GCC's optimizers aren't particularly adept at dealing with SUBREGs. jeff More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
Jeff, thanks again for suggestions. As I understand it (perhaps wrongly), actual splitting only occurs after combine pass (by split1 pass). The combine, does, match one of my patterns (lshift:SI (zero_extend:QI), 8/16/24). The other splitters (zero_extend) are "matched" initially - since they are standard insn. I dont think this matters, if neither is split until split1 - (or does combine perform a split that is hidden from the dump files?) If your description is correct, you may have answered the question - combine is before split1 so there are no further optimisation perfromed on any split - since simplfy-rtx is never called. I could use expand - but I know that will cause a world of hurt by missing optimisations at the "word" level. So new psuedos seems my only way forward (at the target level) Which optimiser/pass would benefit from handling subregs better? best regards Andy -Original Message- From: Jeff Law <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: gcc@gcc.gnu.org Sent: Tue, 19 Feb 2008 2:47 pm Subject: Re: Redundant logical operations left after early splitting [EMAIL PROTECTED] wrote: The RTL for IOR Rx,0 does use subregs (since I use simplify_gen_subreg > in splitter.) > Perhaps I should generate new pseudo QI registers instead before reload? It's been a long time, but yes, you could look into creating new registers if you're early into in the optimization pipeline. Your alternative is to extend the optimizers to better handle subregs better. The latter is far more general and would probably help in numerous situations and is definitely worth a looksie to see if it can be done easily. > Is there any particular function or pass that should be dealing with IOR > rx,0 - that I could trace thru and figure out why it does not like it > (or never gets there)? I would be looking in combine and simplify-rtx (which is called by combine). If your splitter triggers after combine, then I'm not immediately sure where to look -- I'm not offhand aware of a pass after combine which would call into simplify-rtx to perform this optimization. jeff More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
Dave and Jeff, (sorry if you get more than one copy of this email, it's playing up!) Here are more details and I have include testcase, splitter patterns and RTL dump to show problem in more detail. The testcase is: unsigned long f (unsigned char *P) { unsigned long C; C = ((unsigned long)P[1] << 24) | ((unsigned long)P[2] << 16) | ((unsigned long)P[3] << 8) | ((unsigned long)P[4] << 0); return C; } which normally produce horrible code with no significant impact of optimisation. To solve this, back end patterns for zero_extend and the lshift by multiples of 8 were split into QImode moves. The hope was that gcc would then collapse the QImode expression list such as 0|0|0|x or 0|0|y|0 into simple moves. Here are splitter patterns (followed by more stuff): ;; xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x ;; zero extend (define_insn_and_split "zero_extendqi2" [(set (match_operand:HIDI 0 "nonimmediate_operand" "") (zero_extend:HIDI (match_operand:QI 1 "nonimmediate_operand" "")))] "" "#" "" [(const_int 0)] " int i; enum machine_mode dmode = GET_MODE (operands[0]); int dsize = GET_MODE_SIZE (dmode); enum machine_mode smode = GET_MODE (operands[1]); int ssize = GET_MODE_SIZE (smode); rtx dword = simplify_gen_subreg (smode, operands[0], dmode, 0); emit_move_insn (dword, operands[1]); for (i = ssize; i < dsize; i++) { rtx dbyte = simplify_gen_subreg (QImode, operands[0], dmode, i); emit_move_insn (dbyte, const0_rtx); } DONE; ") ;byte shift is just a series of moves ;check src OR dest is in register, so the move will be ok (define_insn_and_split "ashl3_const2p" [(set (match_operand:HIDI 0 "nonimmediate_operand""") (ashift:HIDI (match_operand:HIDI 1 "register_operand" "") (match_operand:HIDI 2 "const_int_operand" "i")) ) ] "((INTVAL (operands[2]) % 8) == 0) " "#" "" [(const_int 0)] { int i; enum machine_mode mode; mode = GET_MODE(operands[0]); int size = GET_MODE_SIZE(mode); HOST_WIDE_INT x = INTVAL (operands[2]); rtx dbytes[8], sbytes[8]; for (i = 0; i < size; i++) { dbytes[i] = simplify_gen_subreg (QImode, operands[0], mode, i); sbytes[i] = simplify_gen_subreg (QImode, operands[1], mode, i); } int shift = x / 8; if (shift > size) shift = size; for (i = shift; i < size; i++) { emit_move_insn (dbytes[i], sbytes[i - shift]); } for (i = 0; i < shift; i++) { emit_move_insn (dbytes[i], const0_rtx); } } ) ; The result kinda works but we are left with OR x,0 (and some missed opportunities to propagate zero constant forward into OR) The spliiters are matched up initially (zero_extend) or at combine - just as expected. All the subregs appear as expected in split1. Naturally this produces a bunch of QI subregs many of which contain zero. No real change happens in RTL until local register allocation (lreg dump file). There are no redundant IOR Rm,0 in dump files before lreg pass. The only note is a reg dead on the pointer argument when it gets moved to a pointer register. (no reg equals or other dead notes until lreg pass) In the lreg dump file I can see the propagation of many (but not all) constant 0 forward into the IOR instructions (eg Rn = 0, Rm= Rm | Rn => Rm = Rm|0). These remains in RTL and are output into final code. Loads of zero into registers which end up being unused are removed in latter passes. I can remove IOR Rm,0 with a targetted splitter to create a NOP - which is my last resort. So here is lreg dump extract: ;; Function f (f) starting the processing of deferred insns ending the processing of deferred insns df_analyze called df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 ( 1) df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 ( 1) Pass 0 Register 42 costs: POINTER_X_REGS:0 POINTER_Y_REGS:0 POINTER_Z_REGS:0 BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:8000 SIMPLE_LD_REGS:8000 LD_REGS:8000 NO_LD_REGS:8000 GENERAL_REGS:8000 ALL_REGS:1 MEM:2 Register 58 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000 Register 59 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000 Register 60 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000 Register 61 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 GENERAL_REGS:0 ALL_REGS:16000 MEM:16000 Register 62 costs:
Re: Redundant logical operations left after early splitting
RTL dumps for Combine pass and ASMCONS (last one before local-alloc) COMBINE ;; Function f (f) starting the processing of deferred insns ending the processing of deferred insns df_analyze called insn_cost 2: 4 insn_cost 6: 16 insn_cost 7: 12 insn_cost 8: 16 insn_cost 9: 16 insn_cost 10: 16 insn_cost 11: 16 insn_cost 12: 16 insn_cost 13: 16 insn_cost 14: 16 insn_cost 15: 16 insn_cost 35: 4 insn_cost 36: 4 insn_cost 37: 4 insn_cost 38: 4 insn_cost 26: 0 Failed to match this instruction: (parallel [ (set (reg:SI 44) (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ]) (const_int 1 [0x1])) [0 S1 A8]))) (set (reg/v/f:HI 42 [ P ]) (reg:HI 24 r24 [ P ])) ]) Failed to match this instruction: (parallel [ (set (reg:SI 44) (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ]) (const_int 1 [0x1])) [0 S1 A8]))) (set (reg/v/f:HI 42 [ P ]) (reg:HI 24 r24 [ P ])) ]) Failed to match this instruction: (set (reg:SI 45) (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ]) (const_int 1 [0x1])) [0 S1 A8])) (const_int 24 [0x18]))) Failed to match this instruction: (parallel [ (set (reg:SI 45) (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ]) (const_int 1 [0x1])) [0 S1 A8])) (const_int 24 [0x18]))) (set (reg/v/f:HI 42 [ P ]) (reg:HI 24 r24 [ P ])) ]) Failed to match this instruction: (parallel [ (set (reg:SI 45) (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ]) (const_int 1 [0x1])) [0 S1 A8])) (const_int 24 [0x18]))) (set (reg/v/f:HI 42 [ P ]) (reg:HI 24 r24 [ P ])) ]) Successfully matched this instruction: (set (reg/v/f:HI 42 [ P ]) (reg:HI 24 r24 [ P ])) Failed to match this instruction: (set (reg:SI 45) (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg:HI 24 r24 [ P ]) (const_int 1 [0x1])) [0 S1 A8])) (const_int 24 [0x18]))) Failed to match this instruction: (set (reg:SI 47) (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ]) (const_int 2 [0x2])) [0 S1 A8])) (const_int 16 [0x10]))) Failed to match this instruction: (set (reg:SI 48) (ior:SI (ashift:SI (reg:SI 44) (const_int 24 [0x18])) (reg:SI 47))) Failed to match this instruction: (set (reg:SI 48) (ior:SI (ashift:SI (reg:SI 46) (const_int 16 [0x10])) (reg:SI 45))) Failed to match this instruction: (set (reg:SI 48) (ior:SI (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ]) (const_int 1 [0x1])) [0 S1 A8])) (const_int 24 [0x18])) (reg:SI 47))) Successfully matched this instruction: (set (reg:SI 45) (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ]) (const_int 1 [0x1])) [0 S1 A8]))) Failed to match this instruction: (set (reg:SI 48) (ior:SI (ashift:SI (reg:SI 45) (const_int 24 [0x18])) (reg:SI 47))) Failed to match this instruction: (set (reg:SI 48) (ior:SI (ashift:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ]) (const_int 2 [0x2])) [0 S1 A8])) (const_int 16 [0x10])) (reg:SI 45))) Successfully matched this instruction: (set (reg:SI 47) (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ]) (const_int 2 [0x2])) [0 S1 A8]))) Failed to match this instruction: (set (reg:SI 48) (ior:SI (ashift:SI (reg:SI 47) (const_int 16 [0x10])) (reg:SI 45))) Failed to match this instruction: (set (reg:SI 48) (ior:SI (ashift:SI (reg:SI 46) (const_int 16 [0x10])) (ashift:SI (reg:SI 44) (const_int 24 [0x18] Successfully matched this instruction: (set (reg:SI 47) (ashift:SI (reg:SI 44) (const_int 24 [0x18]))) Failed to match this instruction: (set (reg:SI 48) (ior:SI (ashift:SI (reg:SI 46) (const_int 16 [0x10])) (reg:SI 47))) Failed to match this instruction: (set (reg:SI 50) (ior:SI (ior:SI (reg:SI 45) (reg:SI 47)) (reg:SI 49))) Failed to match this instruction: (set (reg:SI 50) (ior:SI (zero_extend:SI (mem:QI (plus:HI (reg/v/f:HI 42 [ P ]) (const_int 4 [0x4])) [0 S1 A8])) (reg:SI 48))) Failed to match this instruction: (set (reg:SI 50) (ior:SI (ior:SI (ashift:SI (reg:SI 44) (const_int 24 [0x18])) (reg:SI 47)) (reg:SI 49))) Successfully matched this instruction: (set (reg:SI 48) (ashift:SI (reg:SI 44) (const_int 24 [0x18]))) Failed to match this instruction: (set (reg:SI 50) (ior:SI (ior:SI (reg:SI 48) (reg:SI 47
Re: Redundant logical operations left after early splitting
I still have to try extra fwprop pass to confirm this solves the issue. But it does seem that the missing piece is effective constant propagation + simplification after splits and subregs - which is currently ineffective in local-alloc (or any later pass) Adding extra pass indeed seems a poor route - BUT this would suggest no need for any constant propagation in local-alloc. Thus avoid duplication of function. As for workaround, Nops are nothing already. I just split the instruction into (const_in 0) and it's gone. The split instruction pattern explicitely match on constant. Peephole would do the same thing - but latter Of course I will also have to add all the other variants that can be created: AND Rx,0 AND Rx,0xff etc Leaving only the limited constant propagation issue. Andy More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
Yes - the other approach is to lower at RTL expand. Unfortunately there is no practical way to lower arithmetic and compare operations. reload can add arithmetic which messes the "carry" handling attempts. So we end up with mixed higher level and lower level RTL. This mixture then causes other miss optimisations - which historically have been very significant but grow less with time. The splitting after combine allows existing optimisations to be kept largely intact. The ports that benefit would be any that split operations - after expand (eg combine onwards). How much - who knows! Unfortunately there is no way a target can hook into pass management to make this more specfic. but I'm still thinking about it! Andy More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
Splitters on MovMM are problematic. One problem is that it creates issues with base pointers. AVR base pointer are limited to 64 byte offsets - after that they are inline additions (and perhaps subtractions). So splitting such a move in a large mode is a world of hurt. A future endeavour perhaps may solve this - but for now, it's on the "too difficult pile" I dont think it is relevant here. The missing optimisation of the testcase is more closely related to the arithmetic than to the moves. The output move is already "split" by subreg lowering. After that there are not any pure large mode moves of significance left. Other (non move) splitters can create moves - but these are small and will not be split further. Andy -Original Message- From: Dave Korn <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: gcc@gcc.gnu.org Sent: Wed, 20 Feb 2008 2:15 pm Subject: RE: Redundant logical operations left after early splitting On 20 February 2008 18:05, [EMAIL PROTECTED] wrote: But it does seem that the missing piece is effective constant propagation + simplification after splits and subregs - which is currently ineffective in local-alloc (or any later pass) Hmm, more questions: do you use splitters or insns for your movMM patterns? Do you implement all the movMM patterns? Just checking if we've covered every angle ... cheers, DaveK -- Can't think of a witty .sigline today More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
fwprop is a good suggestion. If that also simplifes substitutions, it may be the magic bullet to collapse all of the code. I will try latter tonite. -Original Message- From: Paolo Bonzini <[EMAIL PROTECTED]> To: Andy H <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; gcc@gcc.gnu.org Sent: Wed, 20 Feb 2008 3:26 am Subject: Re: Redundant logical operations left after early splitting Propagating REG_EQUIV notes across register-register moves would seem to a obviously simple way to fix this. Thoughts? I am not sure local-alloc is the best place to address the overall > problem, I doubt it is intended to provide such optimizations. An additional cse pass after split would seem a better way perhaps? You could try adding another fwprop run after splitting. I wouldn't like a third instance of the pass, but it could help. Paolo More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
Paolo I placed extra fwprop before local-alloc as this was just before NOP got created and after splitting. Putting fwprop after combine is no problem - but is too early - none of the patterns would be split at that time - preventing byte level propagations. As register usage as well as code size is clearly relevant here, effective propagation before reload is the most desirable. (PS I was impressed by fwprop code, I actually stand a chance of understanding some of it) best regards -Original Message- From: Paolo Bonzini <[EMAIL PROTECTED]> To: Andy H <[EMAIL PROTECTED]>; GCC Development Sent: Thu, 21 Feb 2008 1:06 am Subject: Re: Redundant logical operations left after early splitting This would indicate that simplify-rtx inside fwprop is removing OR Rx,0 but not picking up the the additionally revealed forward propagation oppertunities This would seem to be an avoidable limitation. Yes, can you send me your MD patch and a simple testcase? fwprop is supposed to be "cascading", and some bugs in cascading were already revealed by the AVR port. It might be even more worthwhile to try *moving* fwprop2 after combine, then. Paolo More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
If I understand correctly: Prop. of "0" causes simplfy-rtx to create NOP from OR Rx,0 This NOP (deletion?) creates another set of potential uses - as now the prior RHS def now passes straight thru to a new set of uses - but we miss those new uses. (which in the testcase are often 0) I will try fwprop in a few different spots latter and see what, if any changes occur. thanks again Andy -Original Message- From: Paolo Bonzini <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: gcc@gcc.gnu.org Sent: Thu, 21 Feb 2008 11:04 am Subject: Re: Redundant logical operations left after early splitting Putting fwprop after combine is no problem - but is too early - none of > the patterns would be split at that time - preventing byte level > propagations. Yeah, I meant "after split" actually. Anyway, the problem is that if the RHS becomes a constant, fwprop does not propagate the LHS anymore. I can try to add some def->use propagation. (PS I was impressed by fwprop code, I actually stand a chance of > understanding some of it) Thanks (since I and Steven Bosscher wrote it) -- but I guess that's what you can expect from optimization passes that can rely on a decent framework (simplify-rtx.c and df-*.c). Paolo More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Redundant logical operations left after early splitting
Jeff, I take your point and will go back over my splitters as I know one instance in which a splitter can create such an issue. (I should checlk for a constant operand after simplifying an operand). In the cases I looked at before this was not the situation. Each splitter - or subreg lowering potentially creates (or rather reveals) another cascade oppertunity - but current pass arrangement provide no means to remove them (other than the trivial propagation in local-alloc). That seems fundementally wrong. So all targets that use split or have lowered subregs potentially have excess register usage and instructions (how much - I dont know) I dont follow how taking another psuedo inside split helps - that still seems to be another cascade that local-alloc will not remove. As a side issue, if the the current fwprop bug - which often only propagates one operand, can be fixed, then I may be in better shape (since the early pass may then remove any cascades remaining before it hits splitters) I am looking at my original problem again and trying to find some level of earlier optimisation I can use (ie some form of earlier expansion or combination). If I can remove some of the cascaded levels before split, then things get much better! But this is not easy. thank you for your advise -Original Message- From: Jeff Law <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; gcc@gcc.gnu.org Sent: Mon, 25 Feb 2008 4:34 am Subject: Re: Redundant logical operations left after early splitting [EMAIL PROTECTED] wrote: If I understand correctly: > Prop. of "0" causes simplfy-rtx to create NOP from OR Rx,0 This NOP (deletion?) creates another set of potential uses - as now the > prior RHS def now passes straight thru to a new set of uses - but we > miss those new uses. (which in the testcase are often 0) > I will try fwprop in a few different spots latter and see what, if any > changes occur. Classic cascading. I think some of this is an artifact of how your splitters are working. You end up creating code which looks like (set (reg1) (const_int 0) (set (reg1) (ior (reg1) (other_reg) What's important here is that reg1 is being set multiple times. You'd be better off if you can twiddle the splitters to avoid this behavior. If you need a new pseudo, then get one :-) Once you do that, local would propagate these things better. That still leaves the simplification & nop problem, but I'm pretty sure that can be trivially fixed within local without resorting to running another forwprop pass after splitting. Jeff More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
Re: Combine repeats matching on insn pairs and will ICE on 3.
The deed is done! http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35519 I added a patch but it needs more expertise than I have. -Original Message- From: Ian Lance Taylor <[EMAIL PROTECTED]> To: Andy H <[EMAIL PROTECTED]> Cc: GCC Development Sent: Mon, 10 Mar 2008 1:32 pm Subject: Re: Combine repeats matching on insn pairs and will ICE on 3. Andy H <[EMAIL PROTECTED]> writes: > I have problem with data flow and combine that is causing ICE with > experimental build. Despite all efforts to blame my own target > changes, > I have reached the conclusion that this is a gcc COMBINE bug, but seek > your advice before filing a bug report. Your analysis looks right to me. Please do file a bug report. Thanks. Ian
Re: Problem with reloading in a new backend...
I noticed Stack register is missing from ALL_REGS. Are registers 16bit? Is just one required for pointer? Andy
Re: Pointers to add doulle-float capabilities for the avr target
That looks correct. Be aware that support for handling long long operands (64 bits) - or doubles is not well supported on AVR. So you may expose some problems with this. We currently have long long disabled in testsuite testing to avoid noise. At some point I hope to switch it back on to see where we are - but I have enough to deal with <64 bits a present. Andy
Where is setup for "goto" in nested function created?
During the process of fixing setjmp for AVR target, I needed to define targetm.builtin_setjmp_frame_value () to be used in expand_builtin_setjmp_setup(). This sets the value of the Frame pointer stored in jump buffer. I set this "value" to virtual_stack_vars_rtx+1 (==frame_pointer) Receiver defined in target latter restores frame pointer using virtual_stack_vars_rtx = value - 1 This produce correct code as expected and avoids run-time add/sub of offsets. (setjmp works!) However, for a normal goto used inside a nested function, a different part of gcc creates the code to store frame pointer (not expand_builtin_setjmp_setup). I can't find this code. The issue I have is that this "goto_setup" code does NOT use targetm.builtin_setjmp_frame_value - but seems to use value=virtual_stack_vars_rtx, which is incompatible with my target receiver. So where is the goto setup code created? And is there a bug here? regards Andy
Re: Where is setup for "goto" in nested function created?
I already have nonlocal_goto, and nonlocal_goto_receiver. These expect saved frame pointer to be virtual_stack_vars_rtx+1. For setjmp the value is determined by targetm.builtin_setjmp_frame_value - which I defined and is correct. But for goto the value is created by some other code - which appears to be wrong but I can't find! -Original Message- From: David Daney <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: gcc@gcc.gnu.org Sent: Thu, 22 May 2008 1:16 pm Subject: Re: Where is setup for "goto" in nested function created? [EMAIL PROTECTED] wrote: During the process of fixing setjmp for AVR target, I needed to define > targetm.builtin_setjmp_frame_value () to be used in expand_builtin_setjmp_setup(). > This sets the value of the Frame pointer stored in jump buffer. > I set this "value" to virtual_stack_vars_rtx+1 (==frame_pointer) Receiver defined in target latter restores frame pointer using > virtual_stack_vars_rtx = value - 1 > This produce correct code as expected and avoids run-time add/sub of > offsets. (setjmp works!) > However, for a normal goto used inside a nested function, a different > part of gcc creates the code to store frame pointer (not > expand_builtin_setjmp_setup). I can't find this code. > The issue I have is that this "goto_setup" code does NOT use > targetm.builtin_setjmp_frame_value - but seems to use > value=virtual_stack_vars_rtx, which is incompatible with my target > receiver. > So where is the goto setup code created? And is there a bug here? Perhaps you need to implement one or more of: save_stack_nonlocal, restore_stack_nonlocal, nonlocal_goto, and/or nonlocal_goto_receiver. David Daney
Re: Where is setup for "goto" in nested function created?
I think my last email crossed your reply - so apolgies for restating: expand_builtin_nonlocal_goto is fine. This perform stack restore, extracts frame pointer value and does jump. reciever is fine - this jump destination does restore of frame pointer. The problem I have is with frame pointer value that is saved in by "setup" prior to all this For goto is does not use expand_builtin_setjmp_setup - (pathetically) I can't find what it is using. Andy -Original Message- From: Ian Lance Taylor <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: gcc@gcc.gnu.org Sent: Thu, 22 May 2008 2:26 pm Subject: Re: Where is setup for "goto" in nested function created? [EMAIL PROTECTED] writes: However, for a normal goto used inside a nested function, a different part of gcc creates the code to store frame pointer (not expand_builtin_setjmp_setup). I can't find this code. I think you are looking for expand_builtin_nonlocal_goto in builtins.c. Ian
Re: Help with reload and naked constant sum causing ICE
Richard, I appreciate the extra input. I agree with what you say. The target should not be doing middle-end stuff . The inc/dec and (Rxx) != (frame pointer) parts just reload using pointer class which is a one extra register than base pointers but the extra reg cannot take offset. However, I dont know what reload can - or cannot do - (but I'm learning that every day) I do not know fully what was intended by all of AVR L_R_A and how this might now be redundant, useful or wrong. I already have patch with maintainer that takes out a chunk of this to fix another (spill) bug. I have copied Anatoly for comment, and I promise to revisit this again after reviewing reload capabilities. Andy -Original Message- From: Richard Sandiford <[EMAIL PROTECTED]> To: Andy H <[EMAIL PROTECTED]> Cc: GCC Development ; [EMAIL PROTECTED] Sent: Wed, 28 May 2008 3:00 pm Subject: Re: Help with reload and naked constant sum causing ICE Andy H <[EMAIL PROTECTED]> writes: If L_R_A does nothing with it, the normal reload handling will first try: (const:HI (plus:HI (symbol_ref:HI ("chk_fail_buf") (const_int 2 This worked just as your described after I added test of reg_equiv_constant[] inside L_R_A . So I guess that looks like the fix for bug I posted. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34641 To summarize LEGITIMIZE_RELOAD_ADDRESS should now always check reg_equiv_constant before it trying to do any push_reload of register. TBH, I still think AVR is doing far too much in L_R_A. To quote the current version: #define LEGITIMIZE_RELOAD_ADDRESS(X, MODE, OPNUM, TYPE, IND_LEVELS, WIN)\ do {\ if (1&&(GET_CODE (X) == POST_INC || GET_CODE (X) == PRE_DEC)) \ { \ push_reload (XEXP (X,0), XEXP (X,0), &XEXP (X,0), &XEXP (X,0), \ POINTER_REGS, GET_MODE (X),GET_MODE (X) , 0, 0, \ OPNUM, RELOAD_OTHER);\ goto WIN; \ } \ Why does AVR need different POST_INC and PRE_DEC handling from other auto-inc targets? This at least deserves a comment. But really, you should just let reload handle this case. Does reload currently lack some support you need? if (GET_CODE (X) == PLUS \ && REG_P (XEXP (X, 0))\ && GET_CODE (XEXP (X, 1)) == CONST_INT\ && INTVAL (XEXP (X, 1)) >= 1) \ { \ int fit = INTVAL (XEXP (X, 1)) <= (64 - GET_MODE_SIZE (MODE)); \ if (fit) \ { \ if (reg_equiv_address[REGNO (XEXP (X, 0))] != 0) \ { \ int regno = REGNO (XEXP (X, 0)); \ rtx mem = make_memloc (X, regno); \ push_reload (XEXP (mem,0), NULL, &XEXP (mem,0), NULL, \ POINTER_REGS, Pmode, VOIDmode, 0, 0, \ 1, ADDR_TYPE (TYPE));\ push_reload (mem, NULL_RTX, &XEXP (X, 0), NULL, \ BASE_POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \ OPNUM, TYPE);\ goto WIN; \ } \ push_reload (XEXP (X, 0), NULL_RTX, &XEXP (X, 0), NULL, \ BASE_POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \ OPNUM, TYPE);\ goto WIN; \ } \ else if (! (frame_pointer_needed && XEXP (X,0) == frame_pointer_rtx)) \ { \ push_reload (X, NULL_RTX, &X, NULL, \ POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \ OPNUM, TYPE);\ goto WIN; \ } \ } \ } while(0) These too don't make much immediate sense to me. What are they trying to do that reload wouldn't do otherwise? Rather than duplicating more of reload's checks in this macro, I think it would be better to strip it down as far as possible. There's a bit of self-interest here. It makes it very hard to work on reload if ports try to duplicate so much of the logic in target-specific files. Richard
Re: Help with reload and naked constant sum causing ICE
Again thank you and Denis for your comment. Here is what I deduce from code and Denis comments - I am sure he (and others) will correct me if wrong :-) The main issue is that we have one pointer register that cannot take offset and two base pointers with limited offset (0-63). Reload provides no means to pick the right pointer register. To do so would require a "preffered base pointer" mechanism. AVR L_R_A sole purpose is substituting POINTER class where there are oppertunities to do so. BUT as L_R_A is in middle of find_reloads_address, we have a problem in that some normal reload solutions have already been applied and some have not. This may mean that 1) We never get here - so miss POINTER oppertunity or 2) Miss even better solutions that are applied latter. 3)Have code vulnerable to reload changes. I have added comment below to explain the parts. I'll probably need to add some debug hooks to check what parts still contribute. best regards Andy #define LEGITIMIZE_RELOAD_ADDRESS(X, MODE, OPNUM, TYPE, IND_LEVELS, WIN)\ do {\ if (1&&(GET_CODE (X) == POST_INC || GET_CODE (X) == PRE_DEC)) \ { \ push_reload (XEXP (X,0), XEXP (X,0), &XEXP (X,0), &XEXP (X,0), \ POINTER_REGS, GET_MODE (X),GET_MODE (X) , 0, 0, \ OPNUM, RELOAD_OTHER);\ goto WIN; \ } \ The intent is to use POINTER class registers for INC/DEC so we have one more register available than BASE_POINTER Has INC/DEC been handled already making this part redundant? Are there any other solutions that this code skips? if (GET_CODE (X) == PLUS \ && REG_P (XEXP (X, 0))\ && GET_CODE (XEXP (X, 1)) == CONST_INT\ && INTVAL (XEXP (X, 1)) >= 1) \ { \ int fit = INTVAL (XEXP (X, 1)) <= (64 - GET_MODE_SIZE (MODE)); \ if (fit) \ { \ if (reg_equiv_address[REGNO (XEXP (X, 0))] != 0) \ { \ int regno = REGNO (XEXP (X, 0)); \ rtx mem = make_memloc (X, regno); \ push_reload (XEXP (mem,0), NULL, &XEXP (mem,0), NULL, \ POINTER_REGS, Pmode, VOIDmode, 0, 0, \ 1, ADDR_TYPE (TYPE));\ push_reload (mem, NULL_RTX, &XEXP (X, 0), NULL, \ BASE_POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \ OPNUM, TYPE);\ goto WIN; \ } Im struggling with this part. however, the POINTER class is an advantageous in the first reloading the address from memory - where we would not want to waste BASE POINTER and also create overlap problem. The implication of using this code is that without it we cannot catch the inner reload POINTER oppertunity if we leave reload to decompose BASE+offset. I think this is correct :-( \ push_reload (XEXP (X, 0), NULL_RTX, &XEXP (X, 0), NULL, \ BASE_POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \ OPNUM, TYPE);\ goto WIN; This part is duplicating reload and should not be here (I have patch pending with maintainer that removes this - for different bug) \ } \ else if (! (frame_pointer_needed && XEXP (X,0) == frame_pointer_rtx)) \ { \ push_reload (X, NULL_RTX, &X, NULL, \ POINTER_REGS, GET_MODE (X), VOIDmode, 0, 0, \ OPNUM, TYPE);\ goto WIN; \ } \ } \ This case is where offset is out of range - so we allow POINTER class on the basis that this is no worse than BASE POINTER. However, we leave frame_pointer to reload. } while(0) -Original Message- From: Denis Chertykov <[EMAIL PROTECTED]> To: Jeff Law <[EMAIL PROTECTED]> Cc: Andy H <[EMAIL PROTECTED]>; GCC Development ; [EMAIL PROTECTED] Sent: Thu, 29 May 2008 3:23 am Subject: Re: Help with reload and naked constant sum causing ICE 2008/5/29 Jeff Law <[EMAIL PROTECTED]>: Richard Sandiford wrote: Andy H <[EMAIL PROTECTED]> writes: If L_R_A does nothing with it, the normal reload handling will first try: (const:HI (plus:HI (symbol_ref:HI ("chk_fail_buf") (const_int 2 This worked just as your described after I added test of reg_equiv_constant[] inside L_R_A . So I guess t