problem with the scheduler
Hello, I am working with c6x processor from TI. It has a VLIW architecture. It has 32 registers namedly a0-a15 and b0-b15. b15 is used as the SP in the current port. I am facing a problem with the scheduler of GCC. Following is the c code I was compiling - *** int mult(int a,int b) { int result=0,flag; if(b<0) flag=1; else flag=-1; for(;b;b+=flag) result += a; return result; } int main() { return mult(5,4); } Following is part of the assembly generated by GCC - * mult: stw .D2T1 a15,*--b15 || mvk 0, b4 mv b15,a15 ldw .D1T2 *+a15[3], b1 ldw .D1T1 *+a15[2], a3 nop 3 cmplt b1, b4, b0 [ b0] mvkl L2, b4 ; [ b0] mvkh L2, b4 [ b0] b b4 nop 5 [ b1] mvkl L5, b4 || mvk 0, a4 [ b1] mvkh L5, b4 [ b1] b b4 nop 5 ;; problem - this should have been scheduled before ;; the branch instruction mvk -1, b3 L9: ldw .D1T2 *+a15[1], b14 || mv a15,b15 ldw .D2T1 *b15++, a15 add 4, b15,b15 nop 2 b .S2 b14 nop 5 L2: mvk 0, a4 || mvk 1, b3 L5: add b3, b1, b1 || add a3, a4, a4 [ b1] mvkl L5, b4 [ b1] mvkh L5, b4 [ b1] b b4 nop 5 b .S2 L9 nop 5 ***
problem with the scheduler in gcc-4.0-20040911
Hello, I am working with c6x processor from TI. It has a VLIW architecture. It has 32 registers namedly a0-a15 and b0-b15. b15 is used as the SP in the current port. I am facing a problem with the scheduler of GCC. Following is the c code I was compiling - *** int mult(int a,int b) { int result=0,flag; if(b<0) flag=1; else flag=-1; for(;b;b+=flag) result += a; return result; } int main() { return mult(5,4); } Following is part of the assembly generated by GCC. Code was compiled with O2. * mult: stw .D2T1 a15,*--b15;D1,D2 are functional units ;T1,T2 are transmission paths || mvk 0, b4 ;|| implies that this instruction is executed ;in parallel with the previous instruction mv b15,a15 ldw .D1T2 *+a15[3], b1 ldw .D1T1 *+a15[2], a3 nop 3 ;equivalent to 3 nops cmplt b1, b4, b0 [ b0] mvkl L2, b4 ;[] implies conditional execution. The ;instruction is executed if b0 is TRUE [ b0] mvkh L2, b4 [ b0] b b4 nop 5 [ b1] mvkl L5, b4 || mvk 0, a4 [ b1] mvkh L5, b4 [ b1] b b4 nop 5 ;; problem - the below instruction should have been scheduled before ;; the branch instruction because it will not be executed if the branch is ;; taken mvk -1, b3 L9: ldw .D1T2 *+a15[1], b14 || mv a15,b15 ldw .D2T1 *b15++, a15 add 4, b15,b15 nop 2 b .S2 b14 nop 5 L2: mvk 0, a4 || mvk 1, b3 L5: add b3, b1, b1 || add a3, a4, a4 [ b1] mvkl L5, b4 [ b1] mvkh L5, b4 [ b1] b b4 nop 5 b .S2 L9 nop 5 *** Following is the debugging dump by the scheduler - ** ;; == ;; -- basic block 1 from 17 to 89 -- after reload ;; == ;; --- forward dependences: ;; --- Region Dependences --- b 1 bb 0 ;; insn codebb dep prio cost reservation ;; -- --- --- ;; 17 5 0 0 1 1 S1 : ;; 18 5 0 0 1 1 S2 : ;; 9067 0 0 8 1 S2 : 89 91 ;; 9166 0 1 7 1 S2 : 89 ;; 89 100 0 2 6 6 S2 : ;; Ready list after queue_to_ready:90 18 17 ;; Ready list after ready_sort:18 17 90 ;; Ready list (t = 0):18 17 90 ;;0--> 90 (b1) b4=b4+low(L25):S2 ;; dependences resolved: insn 91 into queue with cost=1 ;; Ready-->Q: insn 91: queued for 1 cycles. ;; Ready list (t = 0):18 17 ;;0--> 17 a4=0x0 :S1 ;; Ready list (t = 0):18 ;; Ready-->Q: insn 18: queued for 1 cycles. ;; Ready list (t = 0): ;; Second chance ;; Q-->Ready: insn 18: moving to ready without stalls ;; Q-->Ready: insn 91: moving to ready without stalls ;; Ready list after queue_to_ready:91 18 ;; Ready list after ready_sort:18 91 ;; Ready list (t = 1):18 91 ;;1--> 91 (b1) {b4=high(L25);use b4;}:S2 ;; dependences resolved: insn 89 into queue with cost=1 ;; Ready-->Q: insn 89: queued for 1 cycles. ;; Ready list (t = 1):18 ;; Ready-->Q: insn 18: queued for 1 cycles. ;; Ready list (t = 1): ;; Second chance ;; Q-->Ready: insn 18: moving to ready without stalls ;; Q-->Ready: insn 89: moving to ready without stalls ;; Ready list after queue_to_ready:89 18 ;; Ready list after ready_sort:18 89 ;; Ready list (t = 2):18 89 ;;2--> 89 (b1) pc=b4 :S2 ;; Ready list (t = 2):18 ;; Ready-->Q: insn 18: queued for 1 cycles. ;; Ready list (t = 2): ;;
problem with dependencies in gcc-4.0-20040911
Hello, I am working with a VLIW processor and GCC-4.0-20040911. There is a problem in the dependency calculation of GCC. GCC is giving write-after-read a higher priority than write-after-write. Thus, as in the following code, GCC gives a write-after-read dependency between the 2 instructions. Due to this the 2 instructions are scheduled together. stw --b15,*a15 ;; pre-decrement b15 and store its contents in memory add 4,a15,b15 ;; b15 = a15+4 The first instruction reads b15 and then writes b15(pre-decrement). The second writes to b15. Thus there exists a write-after-read and a write-after-write dependency from the second instruction on the first. But GCC keeps only the more restrictive dependency which is determined by reg-notes.def (line 98). The updation of dependency takes place is sched-deps.c (line 304). Accordingly, GCC puts a write-after-read dependency between the 2 instructions. Due to this, the 2 instructions are scheduled for execution in parallel. This results in 2 writes to the same register on the same clock cycle. As of now, I have the interchanged the 2 lines in reg-notes.def (line 98 and 99). Please correct me if I am wrong. Thanks in advance. Regards, Kunal.
Re: problem with the scheduler in gcc-4.0-20040911
Hello, I have attached the dump after the scheduler. The branch instruction is a conditionally executed branch instruction. So it is represented as RTL COND_EXEC. Regards, Kunal On Tue, 08 Mar 2005 10:14:05 -0500, Vladimir Makarov <[EMAIL PROTECTED]> wrote: > Kunal Parmar wrote: > > >Following is the debugging dump by the scheduler - > > > >** > >;; == > >;; -- basic block 1 from 17 to 89 -- after reload > >;; == > > > >;; --- forward dependences: > > > >;; --- Region Dependences --- b 1 bb 0 > >;; insn codebb dep prio cost reservation > >;; -- --- --- > >;; 17 5 0 0 1 1 S1 : > >;; 18 5 0 0 1 1 S2 : > >;; 9067 0 0 8 1 S2 : 89 91 > >;; 9166 0 1 7 1 S2 : 89 > >;; 89 100 0 2 6 6 S2 : > > > > > >As can be seen in the assembly dump, one instruction is scheduled > >after the branch instruction. The branch is a conditionally executed > >branch instruction. This is incorrect because if the branch is > >executed then the instruction after that will not be executed. > >Please help. > > > > > > > There is no dependency between insn 18 and 89 (the jump). The scheduler > automatically adds anti-dependencies between jump and previous sets. So > I think you jump insn is not represented as RTL JUMP_INSN.The RTL > dump after the scheduler could help more. > > Vlad > > a.c.32.sched2 Description: Binary data
Re: problem with the scheduler in gcc-4.0-20040911
Hello, Thanks alot Vladimir and Daniel. Regards, Kunal On Tue, 8 Mar 2005 11:12:46 -0500, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote: > On Tue, Mar 08, 2005 at 09:38:19PM +0530, Kunal Parmar wrote: > > Hello, > > I have attached the dump after the scheduler. The branch instruction > > is a conditionally executed branch instruction. So it is represented > > as RTL COND_EXEC. > > Vladimir was right. It's an INSN, when it should be a JUMP_INSN. > Your backend is probably using emit_insn when it should be using > emit_jump_insn. > > -- > Daniel Jacobowitz > CodeSourcery, LLC >
protect label from being optimized
Hi, I am working on porting GCC to a new RISC architecture. The ISA does not have a "Jump and Link Register" instruction. So I am simulating one by replacing jal [reg] by load ra, Lret jr reg Lret: in RTL. But my return label is getting optimized away. Could you please tell me how to avoid this. Also, is this the correct approach. Thanks in advance, Kunal
Re: protect label from being optimized
Hi, >> I am working on porting GCC to a new RISC architecture. The ISA does >> not have a "Jump and Link Register" instruction. So I am simulating >> one by replacing >> jal [reg] >> by >> load ra, Lret >> jr reg >> Lret: >> >> in RTL. >> But my return label is getting optimized away. Could you please tell >> me how to avoid this. > >Make sure the load label instruction is using a LABEL_REF. Look at > sh.md for various examples. Is this correct : ret_label = gen_label_rtx (); emit_move_insn (gen_rtx_REG (HImode, 7), gen_rtx_LABEL_REF (VOIDmode, ret_label)); emit_call_insn (gen_brc_call_simulate (addr, args_size)); emit_label (ret_label); Cheers, Kunal
Re: protect label from being optimized
Hi Jim, >>> But my return label is getting optimized away. Could you please tell >>> me how to avoid this. > >You may also need to add a (USE (REG RA)) to the call pattern. Gcc will see that you set a register to the value of the >return label, but it won't see any code that uses that register, so it will optimize away both the load and the label. To >prevent this, you need to add an explicit use of that register to the call insn pattern. I have that.. :). Here is the pattern for the call_insn produced. (define_insn "brc_call_simulate" [(call (mem:HI (match_operand 0 "register_operand" "r")) (match_operand 1 "" "i")) (use (reg:HI 7)) (clobber (reg:HI 7))] "" "jr\t%0") But GCC is not optimizing the load away. Its just the the return label that gets optimized away. CMIIW, I think the problem is this : 1. The label output is local to this function 2. The label creates a new basic block. There is no jump to this label. As a result, there is only one incoming edge to this block and that is the fall through edge after the return from the call. 3. As a result, GCC tries to merge those two blocks and in the process removes the label. How do I prevent this ? Thanks in advance, Kunal
Re: protect label from being optimized
Hi Joern, >The insn that loads the return register with the label needs a REG_LABEL >note to avoid the ref count dropping to zero. The insn has a REG_LABEL (foo.c.110r.vregs) and the label also has a ref count of 1. >You would have to put a (set (pc) (reg RA)) into the pattern of the >call insn. And no matter if you make this a call_insn, jump_insn, >or some newly invented type of insn, I think you will have to change >some middle-end code to cope with an insn that is both a call and a jump. Does this mean that there is no way for me to handle this in RTL without changing target independent code ? In that case, since I don't want to change the target independent code, will it be sufficient if I create a new pattern that holds the label number in the pattern and during final pattern matching, uses this to spit the label manually ? Can I assume that there won't be multiple labels with the same number ? Thanks in advance, Kunal
no mul/div instruction
Hi all, I am porting GCC to a new 16 bit RISC architecture which does not have multiplication and division instructions. I figured that I have to provide emulation routines for the multiplication and division which will be inserted into libgcc2.a. But I am confused about which versions of these routines to provide i.e. do I provide __mulsi3 or __mulhi3 or both. Thanks in advance, Kunal
Re: no mul/div instruction
Hi Ian, On Tue, Apr 22, 2008 at 1:24 PM, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > It depends on UNITS_PER_WORD. If UNITS_PER_WORD is 4, you need > __mulsi3. If UNITS_PER_WORD is 2, you need __mulhi3, and, if you have > 32-bit integer types, you will also need __mulsi3. In the latter case > you can, if you like, get code for __mulsi3 from longlong.h--see the > comments at the top of the file. UNITS_PER_WORD is 2 and INT_TYPE_SIZE is 16. This means I need to provide __mulhi3. Does this mean that libgcc will provide the definition of __mulsi3 ? Thanks, Kunal
Re: no mul/div instruction
Hi Ian, > Yes, I think __mulsi3 will be built for you automatically. I gave a definition of __mulhi3 for my architecture. But I don't get __mulsi3 in libgcc.a. Do I have to enable some options for this ? Thanks in advance, Kunal Parmar
Re: no mul/div instruction
Hi Ian, On Tue, Apr 22, 2008 at 7:12 PM, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > Looking at libgcc2.h, it seems like you might need to define > LIBGCC2_UNITS_PER_WORD in your tm.h file. That solved my problem. Thanks a ton ! Kunal
Re: no mul/div instruction
Hi, I wanted support for software floating point on the architecture. I am using fp-bit.c & dp-bit.c and have defined FLOAT_TYPE_SIZE as 32 and DOUBLE_TYPE_SIZE as 64. dp-bit.c requires __muldi3. How do I enable emulation of 64 bit multiply in libgcc2.a ? Thanks in advance, Kunal Parmar