problem with the scheduler

2005-03-08 Thread Kunal Parmar
Hello,
I am working with c6x processor from TI. It has a VLIW architecture.
It has 32 registers namedly a0-a15 and b0-b15. b15 is used as the SP
in the current port.
I am facing a problem with the scheduler of GCC. 
Following is the c code I was compiling - 

***
int mult(int a,int b) {
  int result=0,flag;

  if(b<0)
flag=1;
  else
flag=-1;
  for(;b;b+=flag)
result += a;
  return result;
}

int main() {
  return mult(5,4);
}


Following is part of the assembly generated by GCC - 

*
mult:
stw .D2T1   a15,*--b15
||  mvk 0,  b4
mv  b15,a15
ldw .D1T2   *+a15[3],   b1
ldw .D1T1   *+a15[2],   a3
nop 3
cmplt   b1, b4, b0
[ b0] mvkl  L2, b4   ;
[ b0] mvkh  L2, b4
[ b0] b b4
nop 5
[ b1] mvkl  L5, b4
||  mvk 0,  a4
[ b1] mvkh  L5, b4
[ b1] b b4
nop 5
;; problem - this should have been scheduled before
;; the branch instruction
mvk -1, b3
L9:
ldw .D1T2   *+a15[1],   b14
||  mv  a15,b15
ldw .D2T1   *b15++, a15
add 4,  b15,b15
nop 2
b   .S2 b14
nop 5
L2:
mvk 0,  a4
||  mvk 1,  b3
L5:
add b3, b1, b1
||  add a3, a4, a4
[ b1] mvkl  L5, b4
[ b1] mvkh  L5, b4
[ b1] b b4
nop 5
b   .S2 L9
nop 5
***


problem with the scheduler in gcc-4.0-20040911

2005-03-08 Thread Kunal Parmar
Hello,
I am working with c6x processor from TI. It has a VLIW architecture.
It has 32 registers namedly a0-a15 and b0-b15. b15 is used as the SP
in the current port.
I am facing a problem with the scheduler of GCC.

Following is the c code I was compiling -

***
int mult(int a,int b) {
 int result=0,flag;

 if(b<0)
   flag=1;
 else
   flag=-1;
 for(;b;b+=flag)
   result += a;
 return result;
}

int main() {
 return mult(5,4);
}



Following is part of the assembly generated by GCC. Code was compiled with O2.


*
mult:
   stw .D2T1   a15,*--b15;D1,D2 are functional units
 
;T1,T2 are transmission paths
||  mvk 0,  b4 ;|| implies
that this instruction is executed
  ;in
parallel with the previous instruction
   mv  b15,a15
   ldw .D1T2   *+a15[3],   b1
   ldw .D1T1   *+a15[2],   a3
   nop 3 
;equivalent to 3 nops
   cmplt   b1, b4, b0
   [ b0] mvkl  L2, b4   ;[] implies
conditional execution. The
 
;instruction is executed if b0 is TRUE
   [ b0] mvkh  L2, b4
   [ b0] b b4
   nop 5
   [ b1] mvkl  L5, b4
||  mvk 0,  a4
   [ b1] mvkh  L5, b4
   [ b1] b b4
   nop 5
;; problem - the below instruction should have been scheduled before
;; the branch instruction because it will not be executed
if the branch is
;; taken
   mvk -1, b3
L9:
   ldw .D1T2   *+a15[1],   b14
||  mv  a15,b15
   ldw .D2T1   *b15++, a15
   add 4,  b15,b15
   nop 2
   b   .S2 b14
   nop 5
L2:
   mvk 0,  a4
||  mvk 1,  b3
L5:
   add b3, b1, b1
||  add a3, a4, a4
   [ b1] mvkl  L5, b4
   [ b1] mvkh  L5, b4
   [ b1] b b4
   nop 5
   b   .S2 L9
   nop 5
***


Following is the debugging dump by the scheduler -

**
;;   ==
;;   -- basic block 1 from 17 to 89 -- after reload
;;   ==

;;   --- forward dependences: 

;;   --- Region Dependences --- b 1 bb 0
;;  insn  codebb   dep  prio  cost   reservation
;;    --   ---       ---
;;   17 5 0 0 1 1   S1  :
;;   18 5 0 0 1 1   S2  :
;;   9067 0 0 8 1   S2  : 89 91
;;   9166 0 1 7 1   S2  : 89
;;   89   100 0 2 6 6   S2  :

;;  Ready list after queue_to_ready:90  18  17
;;  Ready list after ready_sort:18  17  90
;;  Ready list (t =  0):18  17  90
;;0--> 90   (b1) b4=b4+low(L25):S2
;;  dependences resolved: insn 91 into queue with cost=1
;;  Ready-->Q: insn 91: queued for 1 cycles.
;;  Ready list (t =  0):18  17
;;0--> 17   a4=0x0 :S1
;;  Ready list (t =  0):18
;;  Ready-->Q: insn 18: queued for 1 cycles.
;;  Ready list (t =  0):
;;  Second chance
;;  Q-->Ready: insn 18: moving to ready without stalls
;;  Q-->Ready: insn 91: moving to ready without stalls
;;  Ready list after queue_to_ready:91  18
;;  Ready list after ready_sort:18  91
;;  Ready list (t =  1):18  91
;;1--> 91   (b1) {b4=high(L25);use b4;}:S2
;;  dependences resolved: insn 89 into queue with cost=1
;;  Ready-->Q: insn 89: queued for 1 cycles.
;;  Ready list (t =  1):18
;;  Ready-->Q: insn 18: queued for 1 cycles.
;;  Ready list (t =  1):
;;  Second chance
;;  Q-->Ready: insn 18: moving to ready without stalls
;;  Q-->Ready: insn 89: moving to ready without stalls
;;  Ready list after queue_to_ready:89  18
;;  Ready list after ready_sort:18  89
;;  Ready list (t =  2):18  89
;;2--> 89   (b1) pc=b4 :S2
;;  Ready list (t =  2):18
;;  Ready-->Q: insn 18: queued for 1 cycles.
;;  Ready list (t =  2):
;;   

problem with dependencies in gcc-4.0-20040911

2005-03-08 Thread Kunal Parmar
Hello,
I am working with a VLIW processor and GCC-4.0-20040911. There is a
problem in the dependency calculation of GCC. GCC is giving
write-after-read a higher priority than write-after-write. Thus, as in
the following code, GCC gives a write-after-read dependency between
the 2 instructions. Due to this the 2 instructions are scheduled
together.

stw --b15,*a15  ;; pre-decrement b15 and store its contents in memory
add 4,a15,b15  ;; b15 = a15+4

The first instruction reads b15 and then writes b15(pre-decrement).
The second writes to b15. Thus there exists a write-after-read and a
write-after-write dependency from the second instruction on the first.
But GCC keeps only the more restrictive dependency which is determined
by reg-notes.def (line 98). The updation of dependency takes place is
sched-deps.c (line 304). Accordingly, GCC puts a write-after-read
dependency between the 2 instructions. Due to this, the 2 instructions
are scheduled for execution in parallel. This results in 2 writes to
the same register on the same clock cycle.

As of now, I have the interchanged the 2 lines in reg-notes.def (line
98 and 99).

Please correct me if I am wrong.
Thanks in advance.
Regards,
Kunal.


Re: problem with the scheduler in gcc-4.0-20040911

2005-03-08 Thread Kunal Parmar
Hello,
I have attached the dump after the scheduler. The branch instruction
is a conditionally executed branch instruction. So it is represented
as RTL COND_EXEC.
Regards,
Kunal



On Tue, 08 Mar 2005 10:14:05 -0500, Vladimir Makarov
<[EMAIL PROTECTED]> wrote:
> Kunal Parmar wrote:
> 
> >Following is the debugging dump by the scheduler -
> >
> >**
> >;;   ==
> >;;   -- basic block 1 from 17 to 89 -- after reload
> >;;   ==
> >
> >;;   --- forward dependences: 
> >
> >;;   --- Region Dependences --- b 1 bb 0
> >;;  insn  codebb   dep  prio  cost   reservation
> >;;    --   ---       ---
> >;;   17 5 0 0 1 1   S1  :
> >;;   18 5 0 0 1 1   S2  :
> >;;   9067 0 0 8 1   S2  : 89 91
> >;;   9166 0 1 7 1   S2  : 89
> >;;   89   100 0 2 6 6   S2  :
> >
> >
> >As can be seen in the assembly dump, one instruction is scheduled
> >after the branch instruction. The branch is a conditionally executed
> >branch instruction. This is incorrect because if the branch is
> >executed then the instruction after that will not be executed.
> >Please help.
> >
> >
> >
> There is no dependency between insn 18 and 89 (the jump).  The scheduler
> automatically adds anti-dependencies between jump and previous sets.  So
> I think you jump insn is not represented as RTL JUMP_INSN.The RTL
> dump after the scheduler could help more.
> 
> Vlad
> 
>


a.c.32.sched2
Description: Binary data


Re: problem with the scheduler in gcc-4.0-20040911

2005-03-08 Thread Kunal Parmar
Hello,
Thanks alot Vladimir and Daniel.
Regards,
Kunal


On Tue, 8 Mar 2005 11:12:46 -0500, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 08, 2005 at 09:38:19PM +0530, Kunal Parmar wrote:
> > Hello,
> > I have attached the dump after the scheduler. The branch instruction
> > is a conditionally executed branch instruction. So it is represented
> > as RTL COND_EXEC.
> 
> Vladimir was right.  It's an INSN, when it should be a JUMP_INSN.
> Your backend is probably using emit_insn when it should be using
> emit_jump_insn.
> 
> --
> Daniel Jacobowitz
> CodeSourcery, LLC
>


protect label from being optimized

2008-04-17 Thread Kunal Parmar

Hi,

I am working on porting GCC to a new RISC architecture. The ISA does not 
have a "Jump and Link Register" instruction. So I am simulating one by 
replacing

 jal [reg]
by
 load ra, Lret
 jr reg
Lret:

in RTL.
But my return label is getting optimized away. Could you please tell me 
how to avoid this.


Also, is this the correct approach.

Thanks in advance,
Kunal


Re: protect label from being optimized

2008-04-17 Thread Kunal Parmar

Hi,

>> I am working on porting GCC to a new RISC architecture. The ISA does
>> not have a "Jump and Link Register" instruction. So I am simulating
>> one by replacing
>>  jal [reg]
>> by
>>  load ra, Lret
>>  jr reg
>> Lret:
>>
>> in RTL.
>> But my return label is getting optimized away. Could you please tell
>> me how to avoid this.
>
>Make sure the load label instruction is using a LABEL_REF.  Look at
> sh.md for various examples.

Is this correct :
   ret_label = gen_label_rtx ();
   emit_move_insn (gen_rtx_REG (HImode, 7),
   gen_rtx_LABEL_REF (VOIDmode, 
ret_label));

   emit_call_insn (gen_brc_call_simulate (addr, args_size));
   emit_label (ret_label);

Cheers,
Kunal



Re: protect label from being optimized

2008-04-18 Thread Kunal Parmar

Hi Jim,

>>> But my return label is getting optimized away. Could you please tell
>>> me how to avoid this.
>
>You may also need to add a (USE (REG RA)) to the call pattern.  Gcc 
will see that you set a register to the value of the
>return label, but it won't see any code that uses that register, so it 
will optimize away both the load and the label.  To
>prevent this, you need to add an explicit use of that register to the 
call insn pattern.


I have that.. :). Here  is the pattern for the call_insn produced.

(define_insn "brc_call_simulate"
 [(call (mem:HI (match_operand 0 "register_operand" "r"))
(match_operand 1 "" "i"))
  (use (reg:HI 7))
  (clobber (reg:HI 7))]
 ""
 "jr\t%0")

But GCC is not optimizing the load away. Its just the the return label 
that gets optimized away.


CMIIW, I think the problem is this :
1. The label output is local to this function
2. The label creates a new basic block. There is no jump to this label. 
As a result, there is only
one incoming edge to this block and that is the fall through edge after 
the return from the call.
3. As a result, GCC tries to merge those two blocks and in the process 
removes the label.


How do I prevent this ?

Thanks in advance,
Kunal


Re: protect label from being optimized

2008-04-18 Thread Kunal Parmar

Hi Joern,

>The insn that loads the return register with the label needs a REG_LABEL
>note to avoid the ref count dropping to zero.

The insn has a REG_LABEL (foo.c.110r.vregs) and the label also has a ref 
count of 1.


>You would have to put a (set (pc) (reg RA)) into the pattern of the
>call insn.  And no matter if you make this a call_insn, jump_insn,
>or some newly invented type of insn, I think you will have to change
>some middle-end code to cope with an insn that is both a call and a jump.

Does this mean that there is no way for me to handle this in RTL without 
changing

target independent code ?
In that case, since I don't want to change the target independent code, 
will it be
sufficient if I create a new pattern that holds the label number in the 
pattern and
during final pattern matching, uses this to spit the label manually ? 
Can I assume

that there won't be multiple labels with the same number ?

Thanks in advance,
Kunal


no mul/div instruction

2008-04-22 Thread Kunal Parmar

Hi all,

I am porting GCC to a new 16 bit RISC architecture which does not have
multiplication and division instructions. I figured that I have to provide
emulation routines for the multiplication and division which will be
inserted into libgcc2.a. But I am confused about which versions of these
routines to provide i.e. do I provide __mulsi3 or __mulhi3 or both.

Thanks in advance,
Kunal


Re: no mul/div instruction

2008-04-22 Thread Kunal Parmar
Hi Ian,

On Tue, Apr 22, 2008 at 1:24 PM, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
>  It depends on UNITS_PER_WORD.  If UNITS_PER_WORD is 4, you need
>  __mulsi3.  If UNITS_PER_WORD is 2, you need __mulhi3, and, if you have
>  32-bit integer types, you will also need __mulsi3.  In the latter case
>  you can, if you like, get code for __mulsi3 from longlong.h--see the
>  comments at the top of the file.

UNITS_PER_WORD is 2 and INT_TYPE_SIZE is 16. This means I need to
provide __mulhi3. Does this mean that libgcc will provide the definition of
__mulsi3 ?

Thanks,
Kunal


Re: no mul/div instruction

2008-04-22 Thread Kunal Parmar
Hi Ian,

>  Yes, I think __mulsi3 will be built for you automatically.

I gave a definition of __mulhi3 for my architecture. But I don't get
__mulsi3 in libgcc.a. Do I have to enable some options for this ?

Thanks in advance,
Kunal Parmar


Re: no mul/div instruction

2008-04-22 Thread Kunal Parmar
Hi Ian,

On Tue, Apr 22, 2008 at 7:12 PM, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
>  Looking at libgcc2.h, it seems like you might need to define
>  LIBGCC2_UNITS_PER_WORD in your tm.h file.

That solved my problem. Thanks a ton !

Kunal


Re: no mul/div instruction

2008-04-22 Thread Kunal Parmar
Hi,

I wanted support for software floating point on the architecture. I am
using fp-bit.c & dp-bit.c and have defined FLOAT_TYPE_SIZE as 32 and
DOUBLE_TYPE_SIZE as 64. dp-bit.c requires __muldi3. How do I enable
emulation of 64 bit multiply in libgcc2.a ?

Thanks in advance,
Kunal Parmar