How to make 'long int' type be a PDImode?
Hi, I'd like to make a backend which would have 48 bits for 'long' type. (32 for int and 64 for long long). I have tried to define: #define LONG_TYPE_SIZE 48 and one of: INT_MODE (PDI, 6); PARTIAL_INT_MODE (DI); Unfortunately, trying to compile a program, I see that the backend still uses SImode for 'long'. I could not find an example for target doing similar work. Could you please advise or show the location of implementation I can refer? Thank you, Frank
Re: How to make 'long int' type be a PDImode?
On Mon, Mar 8, 2010 at 8:29 AM, Joern Rennecke wrote: > Quoting Frank Isamov : > >> Hi, >> >> I'd like to make a backend which would have 48 bits for 'long' type. >> (32 for int and 64 for long long). >> >> I have tried to define: >> #define LONG_TYPE_SIZE 48 > > That's not a partial integer mode; PDImode would have the same size as > DImode, > just not all bits would be significant. > >> and one of: >> INT_MODE (PDI, 6); > > And that wouldn't be PDImode, more like THImode (three-halves integer mode, > going by the precedent of TQFmode - three-quarter float mode - of the 1750a > port in GCC prior to version 3.1) > I am sorry, I still can conclude how to make the backend use a certain mode for 'long' type. My architecture implements 32- and 48- bit registers and instructions operating on these registers. That it why my intention is to use 'int' for 32 and 'long' for 48 bit operations. My attempts lead to the backend ignore 48 bits path inspite of the LONG_TYPE_SIZE definition and 'long' is still processed as 32 bit value. I think I am missing something, so I am applying for help. Any piece of information would be useful. Thank you, Frank
Re: How to make 'long int' type be a PDImode?
On Mon, Mar 8, 2010 at 4:27 PM, Frank Isamov wrote: > On Mon, Mar 8, 2010 at 8:29 AM, Joern Rennecke > wrote: >> Quoting Frank Isamov : >> >>> Hi, >>> >>> I'd like to make a backend which would have 48 bits for 'long' type. >>> (32 for int and 64 for long long). >>> >>> I have tried to define: >>> #define LONG_TYPE_SIZE 48 >> >> That's not a partial integer mode; PDImode would have the same size as >> DImode, >> just not all bits would be significant. >> >>> and one of: >>> INT_MODE (PDI, 6); >> >> And that wouldn't be PDImode, more like THImode (three-halves integer mode, >> going by the precedent of TQFmode - three-quarter float mode - of the 1750a >> port in GCC prior to version 3.1) >> > > > I am sorry, I still can conclude how to make the backend use a certain > mode for 'long' type. > My architecture implements 32- and 48- bit registers and instructions > operating on these registers. That it why my intention is to use 'int' > for 32 and 'long' for 48 bit operations. My attempts lead to the > backend ignore 48 bits path inspite of the LONG_TYPE_SIZE definition > and 'long' is still processed as 32 bit value. I think I am missing > something, so I am applying for help. Any piece of information would > be useful. > > Thank you, > Frank > Correction: "I still can conclude " should be read as "I still cannot conclude"
Advancing SP on a call
We have a problem with arguments passing in memory. The caller puts the arguments in memory relative to the sp: add sp, 4 // allocate space for the argument. stack grows up store r1, (sp-4) // store the argument on the stack call xxx// call the function. In xxx the result code looks like: load (sp-4), r1 // load the argument from the stack. The problem is that the 'call' instruction pushes the return address to the stack and increments the sp by 4 so when the callee tries to access the memory it does not get to the correct location. How can I tell GCC that that the callee should load from the original offset + 4? Thanks.
Coloring problem - Pass 0 for finding allocno costs
Hi, In my backend, I have a problem with the pass which determines the best register class for a virtual register (Pass 0 for finding allocno costs). In all insns in this example both R_REGS and D_REGS register classes are applicable (but all registers in an insn should be from the same register class). This is asmcons output: ;; Function mul (mul) (note 1 0 5 NOTE_INSN_DELETED) (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 5 3 2 NOTE_INSN_DELETED) (insn 3 2 4 2 a.c:2 (set (reg/v:SI 97 [ b ]) (reg:SI 49 r1 [ b ])) 43 {movsi_regs} (expr_list:REG_DEAD (reg:SI 49 r1 [ b ]) (nil))) (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 4 9 2 a.c:5 (set (reg/v:SI 94 [ __a_11 ]) (plus:SI (reg:SI 48 r0 [ a ]) (reg/v:SI 97 [ b ]))) 0 {*addsi3_1} (expr_list:REG_DEAD (reg:SI 48 r0 [ a ]) (nil))) (insn 9 7 10 2 a.c:5 (parallel [ (set (reg:SI 100) (ashift:SI (reg/v:SI 97 [ b ]) (const_int 3 [0x3]))) (clobber (reg:CC 88 cc)) ]) 36 {ashlsi3} (expr_list:REG_UNUSED (reg:CC 88 cc) (expr_list:REG_EQUAL (ashift:SI (reg/v:SI 97 [ b ]) (const_int 3 [0x3])) (nil (insn 10 9 11 2 a.c:5 (set (reg:SI 101) (plus:SI (reg:SI 100) (reg/v:SI 97 [ b ]))) 0 {*addsi3_1} (expr_list:REG_DEAD (reg:SI 100) (expr_list:REG_DEAD (reg/v:SI 97 [ b ]) (expr_list:REG_EQUAL (mult:SI (reg/v:SI 97 [ b ]) (const_int 9 [0x9])) (nil) (insn 11 10 12 2 a.c:5 (set (reg:SI 102) (plus:SI (reg:SI 101) (reg/v:SI 94 [ __a_11 ]))) 0 {*addsi3_1} (expr_list:REG_DEAD (reg:SI 101) (expr_list:REG_DEAD (reg/v:SI 94 [ __a_11 ]) (nil (note 12 11 17 2 NOTE_INSN_DELETED) (insn 17 12 23 2 a.c:7 (parallel [ (set (reg/i:SI 48 r0) (ashift:SI (reg:SI 102) (const_int 10 [0xa]))) (clobber (reg:CC 88 cc)) ]) 36 {ashlsi3} (expr_list:REG_UNUSED (reg:CC 88 cc) (expr_list:REG_DEAD (reg:SI 102) (nil (insn 23 17 0 2 a.c:7 (use (reg/i:SI 48 r0)) -1 (nil)) The problem I see is that for registers 100,101 I get best register class D instead of R – actually they get the same cost and D is chosen (maybe because it is first). But they should not get the same cost since choosing D_REGS causes two additional copies. To my understanding the algorithm checks every insn, and since register 100 appears only in insns in which all registers are still virtual, any register class fits without additional cost. But later when coloring it with d register, we need copies from r to d and back to r. Can someone please help with this? Is there any reading material about this part of the IRA? Thanks, Frank.
Re: Coloring problem - Pass 0 for finding allocno costs
-- Forwarded message -- From: Frank Isamov Date: Thu, Mar 18, 2010 at 4:28 PM Subject: Re: Coloring problem - Pass 0 for finding allocno costs To: Ian Bolton On Thu, Mar 18, 2010 at 3:51 PM, Ian Bolton wrote: >> The problem I see is that for registers 100,101 I get best register >> class D instead of R - actually they get the same cost and D is chosen >> (maybe because it is first). > > Hi Frank. > > Do D and R overlap? It would be useful to know which regs are in > which class, before trying to understand what is going on. > > Can you paste an example of your define_insn from your MD file to show > how operands from D or R are both valid? I ask this because it is > possible to express that D is more expensive than R with operand > constraints. > > For general IRA info, you might like to look over my long thread on > here called "Understanding IRA". > > Cheers, > Ian > Hi Ian, Thank you very much for your prompt reply. D and R are not overlap. Please see fragments of .md and .h files below: From the md: (define_register_constraint "a" "R_REGS" "") (define_register_constraint "d" "D_REGS" "") (define_predicate "a_operand" (match_operand 0 "register_operand") { unsigned int regno; if (GET_CODE (op) == SUBREG) op = SUBREG_REG (op); regno = REGNO (op); return (regno >= FIRST_PSEUDO_REGISTER || REGNO_REG_CLASS(regno) == R_REGS); } ) (define_predicate "d_operand" (match_operand 0 "register_operand") { unsigned int regno; if (GET_CODE (op) == SUBREG) op = SUBREG_REG (op); regno = REGNO (op); return (regno >= FIRST_PSEUDO_REGISTER || REGNO_REG_CLASS(regno) == D_REGS); } ) (define_predicate "a_d_operand" (ior (match_operand 0 "a_operand") (match_operand 0 "d_operand"))) (define_predicate "i_a_d_operand" (ior (match_operand 0 "immediate_operand") (match_operand 0 "a_d_operand"))) (define_insn "mov_regs" [(set (match_operand:SISFM 0 "a_d_operand" "=a, a, a, d, d, d") (match_operand:SISFM 1 "i_a_d_operand" "a, i, d, a, i, d"))] "" "move\t%1, %0" ) (define_insn "*addsi3_1" [(set (match_operand:SI 0 "a_d_operand" "=a, a, a,d,d") (plus:SI (match_operand:SI 1 "a_d_operand" "%a, a, a,d,d") (match_operand:SI 2 "nonmemory_operand" "U05,S16,a,d,U05")))] "" "adda\t%2, %1, %0" ) ;; Arithmetic Left and Right Shift Instructions (define_insn "3" [(set (match_operand:SCIM 0 "register_operand" "=a,d,d") (sh_oprnd:SCIM (match_operand:SCIM 1 "register_operand" "a,d,d") (match_operand:SI 2 "nonmemory_operand" "U05,d,U05"))) (clobber (reg:CC CC_REGNUM))] "" "\t%2, %1, %0" ) From the h file: #define REG_CLASS_CONTENTS \ { \ {0x, 0x, 0x}, /* NO_REGS*/ \ {0x, 0x, 0x}, /* D_REGS*/ \ {0x, 0x, 0x}, /* R_REGS*/ \ ABI requires use of R registers for arguments and return value. Other than that all of these instructions are more or less symmetrical in sense of using D or R. So, an optimal choice would be use of R for this example. And if D register is chosen, it involves additional copy from R to D and back to R. Thank you, Frank
Combine or peephole?
Hi, My architecture supports instructions with two parallel side effects. For example, addition and subtraction can be done in parallel: (define_insn "assi6" [(parallel [ (set (match_operand:SI 0 "register_operand" "=r") (minus:SI (match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r"))) (set (match_operand:SI 3 "register_operand" "=r") (plus:SI (match_operand:SI 4 "register_operand" "r") (match_operand:SI 5 "register_operand" "r"))) ])] "" "as\t%5, %4, %3, %2, %1, %0 %!" ) This instruction is not chosen at ‘combine’ time even if ‘plus’ and ‘minus’ instructions are located one after other. That it is, probably, there is no data dependency between them. In attempt to resolve the problem, I am providing a peephole optimization: (define_peephole2 [ (set (match_operand:SI 0 "register_operand") (minus:SI (match_operand:SI 1 "register_operand") (match_operand:SI 2 "register_operand"))) (set (match_operand:SI 3 "register_operand") (plus:SI (match_operand:SI 4 "register_operand") (match_operand:SI 5 "register_operand"))) ] "" [(parallel [ (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2))) (set (match_dup 3) (plus:SI (match_dup 4) (match_dup 5))) ])] "" ) This works for some cases, but I wanted to ask experts whether this is the way to go. Repeating the same pattern for a peephole might be not the best way to resolve the problem. Did you observe something similar? What would be the best way to resolve such situation? Thank you, Frank
Re: Combine or peephole?
On Mon, Apr 19, 2010 at 5:54 PM, Jeff Law wrote: > > combine requires a data dependency, so for this situation, combine isn't > going to help. The easy solution is to create a peephole. You can also > create a machine dependent reorg pass to detect more of these opportunities. > Jeff > Hi Jeff, et al, Thank you for your reply. Two more questions: 1. Is it possible to add a machine dependent reorg pass at backend level without changing the standard infrastructure? If so, can you please point me such example? If no, may the new plugin architecture help here? 2. A peephole for such case just repeats instruction definition pattern. As all information already available for such peephole, wouldn’t it be useful to implement the pass to be a part of the standard infrastructure? Thank you, Frank
Re: Combine or peephole?
Hi Ian, On Wed, Apr 21, 2010 at 5:42 PM, Ian Lance Taylor wrote: > Frank Isamov writes: > >> 2. A peephole for such case just repeats instruction definition >> pattern. As all information already available for such peephole, >> wouldn’t it be useful to implement the pass to be a part of the >> standard infrastructure? > > See define_peephole and define_peephole2. If that doesn't answer your > question, can you rephrase it? > > Ian > I think I understood the points from Jeff’s reply and I am going to look at PA implementation now. Just for this email thread completeness, I’ll try to rephrase the initial question: Instructions which manipulate with data in parallel and have no data dependency automatically require peephole2 definition or/and machine dependent reorg pass. (Please see an example at the bottom of this email). Peephole2 pattern, in this case, just repeats instruction’s RTL pattern. As such instructions can appear in SIMD architectures, I just thought that it would be profitable to have this pass to be a part of the common infrastructure. (define_insn "assi6" [(parallel [ (set (match_operand:SI 0 "register_operand" "=r") (minus:SI (match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r"))) (set (match_operand:SI 3 "register_operand" "=r") (plus:SI (match_operand:SI 4 "register_operand" "r") (match_operand:SI 5 "register_operand" "r"))) ])] "" "as\t%5, %4, %3, %2, %1, %0 %!" ) (define_peephole2 [ (set (match_operand:SI 0 "register_operand") (minus:SI (match_operand:SI 1 "register_operand") (match_operand:SI 2 "register_operand"))) (set (match_operand:SI 3 "register_operand") (plus:SI (match_operand:SI 4 "register_operand") (match_operand:SI 5 "register_operand"))) ] "" [(parallel [ (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2))) (set (match_dup 3) (plus:SI (match_dup 4) (match_dup 5))) ])] "" )