from:"Alex Turjan"

Re: Scheduling automaton question

2011-02-11 Thread Alex Turjan

Hi,
According to me at this moment the scheduler does not support your needs.

I was confronted with a similar problem as yours and I solved it by 
implementing the TARGET_SCHED_DFA_NEW_CYCLE hook. Inside of the function which 
supports this hook I choose/set the insn reservation that makes  possible to 
fit as many other insn as possible in the same cycle.

During this process I also update the ready list with insns that become ready 
as a result of scheduling the current insn ( like in your example -insns that 
are anti-dependent on the current insn and which therefore can be scheduled in 
the current cycle). Thus the best insn reservation makes possible scheduling 
antidependent insns of cost zero in the same cycle by avoiding resource 
conflicts. 

Alex

few changes in the gcc mainline sources

--- On Fri, 2/11/11, Bernd Schmidt  wrote:

> From: Bernd Schmidt 
> Subject: Scheduling automaton question
> To: "GCC List" 
> Cc: "Vladimir N. Makarov" 
> Date: Friday, February 11, 2011, 2:33 PM
> Suppose I have two insns, one
> reserving (A|B|C), and the other reserving
> A. I'm observing that when the first one is scheduled in an
> otherwise
> empty state, it reserves the A unit and blocks the second
> one from being
> scheduled in the same cycle. This is a problem when there's
> an
> anti-dependence of cost 0 between the two instructions.
> 
> Vlad - two questions. Is this behaviour what you would
> expect to happen,
> and how much work do you think would be involved to fix it
> (i.e. make
> the first one transition to a state where we can still
> reserve any two
> out of the three units)?
> 
> 
> Bernd
>

Re: Question about the difference between two instruction scheduling passes

2009-08-19 Thread Alex Turjan

> Gcc only does this work in the second pass, but what's the
> point? Is it wrong or just not necessary  in the first sched
> pass?

Regardless of the target architecture from the correctness point of view sched1 
can be disabled. sched1 has as purpose shortening live ranges.
Short live ranges allow the register allocation to:
1. generate less spills and 
2. also to avoid useless live ranges splitting. 

The second point impacts the register renaming phase which allows sched2 phase 
to extract more ilp.

Re: IRA undoing scheduling decisions

2009-08-27 Thread Alex Turjan

> With 4.4, IRA happens to reuse the same register for both pseudos, so 
> sched2 is hand tied and cannot schedule them back again for us.

I can imagine compiling other programs for which preserving the 4.3 allocation 
will induce performance degradation due to spilling. 

The register allocator tries to minimize the number of spills without taking 
into account the ILP implications (i.e., creating extra false dependencies). 
Perhaps one possible way to solve the problem would be to analyze why the 
register rename phase (which is responsible for spreading the registers) does 
not produces 2 registers.


--- On Wed, 8/26/09, Peter Bergner  wrote:

> From: Peter Bergner 
> Subject: Re: IRA undoing scheduling decisions
> To: "Charles J. Tabony" 
> Cc: gcc@gcc.gnu.org
> Date: Wednesday, August 26, 2009, 11:47 PM
> On Mon, 2009-08-24 at 23:56 +,
> Charles J. Tabony wrote:
> > I am seeing a performance regression on the port I
> maintain, and I would appreciate some pointers.
> > 
> > When I compile the following code
> > 
> > void f(int *x, int *y){
> >   *x = 7;
> >   *y = 4;
> > }
> > 
> > with GCC 4.3.2, I get the desired sequence of
> instructions.  I'll call it sequence A:
> > 
> > r0 = 7
> > r1 = 4
> > [x] = r0
> > [y] = r1
> > 
> > When I compile the same code with GCC 4.4.0, I get a
> sequence that is lower performance for my target
> machine.  I'll call it sequence B:
> > 
> > r0 = 7
> > [x] = r0
> > r0 = 4
> > [y] = r0
> 
> This is caused by update_equiv_regs() which IRA inherited
> from local-alloc.c.
> Although with gcc 4.3 and earlier, you don't see the
> problem, it is still there,
> because if you look at the 4.3 dumps, you will see that
> update_equiv_regs()
> unordered them for us.  What is saving us is that
> sched2 reschedules them
> again for us in the order we want.  With 4.4, IRA
> happens to reuse the same
> register for both pseudos, so sched2 is hand tied and
> cannot schedule them
> back again for us.
> 
> Looking at update_equiv_regs(), if I disable the
> replacement for regs
> that are local to one basic block (patch below) like it
> existed before
> John Wehle's patch way back in Oct 2000:
> 
>   http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html
> 
> then we get the ordering we want.  Does anyone know
> why John removed
> that part of the test in his patch?  Thoughts anyone?
> 
> 
> Peter
> 
> 
> Index: ira.c
> ===
> --- ira.c    (revision 15)
> +++ ira.c    (working copy)
> @@ -2510,6 +2510,7 @@ update_equiv_regs (void)
>           
>    calls.  */
>  
>            if
> (REG_N_REFS (regno) == 2
> +             
> && REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS
>               
> && (rtx_equal_p (x, src)
>             
>   || ! equiv_init_varies_p (src))
>               
> && NONJUMP_INSN_P (insn)
> 
> 
> 
>

question about DSE

2009-09-08 Thread Alex Turjan

Dear all,
Im writing to you regarding the dead store elimination (dse) which runs after 
register allocation. Apparently dse removes wrongly the following store 
(present in bb2):

(insn 374 47 52 2 test.c:107 (set (mem/c:SI (plus:PSI (reg/f:PSI 55 ptr15)
(const_int 96 [0x60])) [19 fac_iter+0 S4 A32])
(reg/v:SI 16 r16 [orig:161 step109 ] [161])) 48  
{si_indexed_store_incl_ra} (nil))

despite being consumed (in bb3) by the following 2 loads:
(insn 380 71 64 3 test.c:112 (set (reg:HI 1 r1)
(mem:HI (plus:PSI (reg/f:PSI 55 ptr15)
(const_int 96 [0x60])) [0 S2 A16])) 12 {load} (nil))

(insn 382 346 65 3 test.c:112 (set (reg:HI 5 r5)
(mem:HI (plus:PSI (reg/f:PSI 55 ptr15)
(const_int 98 [0x62])) [0 S2 A16])) 12 {load} (nil))


Can anyone point what may be the problem?

As you can see the store is SI while the loads are HI. While looking to the 
comments from dse.c I get to the following remark:

" There are three cases where dse falls short:
 a) Reload sometimes creates the slot for one mode of access, and
 then inserts loads and/or stores for a smaller mode. "

Does it mean that such cases are not treated properly by dse?

I observed that if I run with the flag -fno-strict-aliasing the wrongly removed 
store is no longer removed and the code is runs correctly. 
Im wondering does the dse after register allocation make use of type based 
alias analysis? 

reagards,
Alex

Re: question about DSE

2009-09-09 Thread Alex Turjan

Hi Michael,

> My assumption would be these two split loads of HImode are
> generated by your backend from a given SImode MEM. 

Indeed your asumption is right. Bellow I have a mulsi3 expand in which I 
generate insns of mode HI. operands[1] gets spilled: in the produced BB as a 
single SI store while in the consumer BB as two separte HI loads (see a_hi and 
a_lo).

(define_expand "mulsi3"
 [(match_operand: SI 0 "general_register_operand" "")
  (match_operand: SI 1 "general_register_operand" "")
  (match_operand: SI 2 "general_register_operand" "")]
 ""
 "{
  rtx buff = gen_reg_rtx(SImode);
  rtx a_lo = gen_rtx_SUBREG(HImode, operands[1], 0); 
  rtx a_hi = gen_rtx_SUBREG(HImode, operands[1], 2); 
  rtx b_lo = gen_rtx_SUBREG(HImode, operands[2], 0); 
  rtx b_hi = gen_rtx_SUBREG(HImode, operands[2], 2); 
  rtx r_hi = gen_rtx_SUBREG(HImode, buff, 2); 
  emit_insn(gen_umulhisi3(buff, a_lo, b_lo));
  emit_insn(gen_machi3(r_hi, a_hi, b_lo, r_hi));
  emit_insn(gen_machi3(r_hi, a_lo, b_hi, r_hi));
  emit_move_insn(operands[0], buff);
  DONE;
}")

> If so, you need
> to make sure to copy the MEM_ALIAS_SET, at least for spill slots (better
> for everything) into the newly generated HImode mems.  For spill slots
> it's not enough to set it to zero.

I get your point but as the generation SI->HI takes place in the expand it 
doesnt help to copy the MEM_ALIAS_SET becasue the operands are pseudo regs. 

However, to get a correct implementation I did the following. Instead of doing 
the split in the expand (as show above), I made use of the following 
define_insn_and_split:

(define_expand "mulsi3"
 [(parallel
[(set (match_operand:SI 0 "register_operand" "")
(mult:SI (match_operand:SI 1 "register_operand" "")
  (match_operand:SI 2 "nonmemory_operand" "")))
   (clobber (match_operand:SI  3 "register_operand"   ""))
]
)
]
 ""
 "{
operands[3]  = gen_reg_rtx(SImode);
}")


(define_insn_and_split "*mulsi3"
 [(parallel[(set (match_operand:SI 0 "register_operand" "=d,d")
(mult:SI (match_operand:SI 1 "register_operand" "d,d")
  (match_operand:SI 2 "nonmemory_operand" "d,I")))
(clobber (match_operand:SI  3 "register_operand"   "=d,d"))
  ])]
""
 "#"
 "reload_completed"
  [(clobber (const_int 0))]
"{
  rtx a_lo = gen_rtx_SUBREG(HImode, operands[1], 0); 
  rtx a_hi = gen_rtx_SUBREG(HImode, operands[1], 2); 
  rtx b_lo = gen_rtx_SUBREG(HImode, operands[2], 0); 
  rtx b_hi = gen_rtx_SUBREG(HImode, operands[2], 2); 
  rtx r_hi = gen_rtx_SUBREG(HImode, operands[3], 2); 
  emit_insn(gen_umulhisi3(operands[3], a_lo, b_lo));
  emit_insn(gen_machi3(r_hi, a_hi, b_lo, r_hi));
  emit_insn(gen_machi3(r_hi, a_lo, b_hi, r_hi));
  emit_move_insn(operands[0], operands[3]);
 DONE;
}")

By using this define_insn_and_split with the predicate "reload_completed" 
I ensure that the register allocation takes place on the operands of the 
"mulsi3" instruction as defined by the define_expand construct. In this way 
instead of the two separate HI loads (from my previouse mail) I get only one SI 
load which aliases whith the SI store. In consequence the SI store is no longer 
removed.

1.What do you think about this implementation? using define_insn_and_split
2.Is is true that in the define_expand constructs I should avoid inducing 
subregs?

thanks,
Alex

bug linear loop transforms

2010-03-29 Thread Alex Turjan

Im writing to you regarding a possible bug in linear loop transfor.
The bug can be reproduce by compiling the attached c file with gcc.4.5.0 
(20100204, 20100325) on x86 machine.

The compiler flags that reproduce the error are:
-O2 -fno-inline -fno-tree-ch -ftree-loop-linear

If the compiler is run with:
-O2 -fno-inline -fno-tree-ch -fno-tree-loop-linear 
then the produced code is correct.


  #include 

int test (int n, int *a)
{
  int i, j;

  for (i = 0; i < n; i++)
{
  for (j = 0; j < n; j++)
{
  a[j] = i + n;
}
}


if (a[0] != 31 || i + n - 1 != 31)
   printf("incorrect %d  %d \n", a[0], i+n-1);  

  return 0;
}

int main (void)
{
  int a[16];
  test (16, a);
  return 0;
}

generate assembly mnemonic depending the resource allocation

2008-12-03 Thread Alex Turjan

Hi all,
Im building a gcc target for a vliw machine that can execute the same 
instruction on different resources (slots) and depending on which resources are 
allocate the instruction must have a different mnemonic. Is it possible in gcc 
to have for the same define_insn constraints (depending on the allocated 
architecture resources) different assembly instructions?

Here is an example:
Consider the following addSI RTL pattern:
(define_insn "addsi3"
  [(set (match_operand:SI 0 "general_register_operand" "=d")
(plus:SI (match_operand:SI 1 "general_register_operand" "d")
 (match_operand:SI 2 "general_register_operand" "d")))]
  ""
  "add %0,%1,%2%"
 [(set_attr "type" "alu")
   (set_attr "mode" "SI")
   (set_attr "length"   "1")])

On my target machine "alu" is a reservation that occupies one of the following 
3 slots: "slot1|slot2|slot3" and, I need to generate assembly code with 
different mnemonic depending on which slot the instruction was scheduled:

add-slot1 %0,%1,%2% // if scheduled on slot 1
add-slot2 %0,%1,%2% // if scheduled on slot 2
add-slot3 %0,%1,%2% // if scheduled on slot 3

Alex

RE: generate assembly mnemonic depending the resource allocation

2008-12-03 Thread Alex Turjan

Hi,
Thanks for answer but Im not able to make it work; I dont know how to extract 
the specific slot-usage (out of a number of an alternatives) out of the RTL 
insn description.
Is there some get_attr_ for this? 


--- On Wed, 12/3/08, Bingfeng Mei <[EMAIL PROTECTED]> wrote:

> From: Bingfeng Mei <[EMAIL PROTECTED]>
> Subject: RE: generate assembly mnemonic depending the resource  allocation
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "gcc@gcc.gnu.org" 
> 
> Date: Wednesday, December 3, 2008, 3:00 PM
> You can use C statements to return a modified template
> string such like
> (define_insn "addsi3"
>   [(set (match_operand:SI 0
> "general_register_operand" "=d")
> (plus:SI (match_operand:SI 1
> "general_register_operand" "d")
>  (match_operand:SI 2
> "general_register_operand" "d")))]
>   ""
>   {
>   switch (slot-used){
> case 0:
>   return "add-slot0, %0, %1, %2";
> case 1:
>   return "add-slot1, %0, %1, %2";
> case 2:
>   return "add-slot1, %0, %1, %2";
>   }
>  } 
>  [(set_attr "type" "alu")
>(set_attr "mode" "SI")
>(set_attr "length"   "1")])
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On 
> > Behalf Of Alex Turjan
> > Sent: 03 December 2008 10:34
> > To: gcc@gcc.gnu.org
> > Subject: generate assembly mnemonic depending the
> resource allocation
> > 
> > Hi all,
> > Im building a gcc target for a vliw machine that can
> execute 
> > the same instruction on different resources (slots)
> and 
> > depending on which resources are allocate the
> instruction 
> > must have a different mnemonic. Is it possible in gcc
> to have 
> > for the same define_insn constraints (depending on the
> 
> > allocated architecture resources) different assembly
> instructions?
> > 
> > Here is an example:
> > Consider the following addSI RTL pattern:
> > (define_insn "addsi3"
> >   [(set (match_operand:SI 0
> "general_register_operand" "=d")
> > (plus:SI (match_operand:SI 1
> "general_register_operand" "d")
> >  (match_operand:SI 2 
> > "general_register_operand" "d")))]
> >   ""
> >   "add %0,%1,%2%"
> >  [(set_attr "type" "alu")
> >(set_attr "mode" "SI")
> >(set_attr "length"   "1")])
> > 
> > On my target machine "alu" is a reservation
> that occupies one 
> > of the following 3 slots:
> "slot1|slot2|slot3" and, I need to 
> > generate assembly code with different mnemonic
> depending on 
> > which slot the instruction was scheduled:
> > 
> > add-slot1 %0,%1,%2% // if scheduled on slot 1
> > add-slot2 %0,%1,%2% // if scheduled on slot 2
> > add-slot3 %0,%1,%2% // if scheduled on slot 3
> > 
> > Alex
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >   
> > 
> >

Re: generate assembly mnemonic depending the resource allocation

2008-12-03 Thread Alex Turjan

> Keep
> track of the instruction slot, and set some global variable
> in that macro.  

I see what you mean but for my target architecture the slots are not 
identical in sense that they are claiming totaly different resources. 
This means that in order to decide at a certain cycle which mnemonic to 
instantiate I need to guarantee that there are no pipeline conflicts.

Im now looking the in gcc mainline to find when an instruction (with 
alternative resource utilizations) is issued which of the resources are used.

Re: generate assembly mnemonic depending the resource allocation

2008-12-03 Thread Alex Turjan

Dear Ian, 
I use as atribute of one of the instructions the following 
define_insn_reservation:
(define_insn_reservation "vmove" 1 "vector_type" "vmove") "c_valu_1|c_vlsu_1")

As you can see vmove has two alternative reservation : c_valu_1 or c_vlsu_1, 
where c_valu_1 is defined as follows:
(define_reservation "c_valu_1" "nothing, r_valu_rp, nothing, nothing, nothing, 
nothing")

and c_vlsu_1:
(define_reservation "c_vlsu_1" "nothing, r_vlsu, nothing, nothing, nothing, 
nothing")


Having an RTX after sched2, how can I decide which of the two alternatives 
(c_valu_1 or c_vlsu_1) the scheduler has selected?  

thanks for your help!
Alex

query automaton

2009-02-25 Thread Alex Turjan

Hello,
Some time ago I asked a question regarding the possibility to schedule an 
operation on alternative functional units (FUs) AND depending on the chosen FUs 
to generate a dedicated assembly mnemonic. 

To give a simple example suppose I have a move operation that can be issued on 
one of the 2 different FUs:

(define_reservation "si_move""(q_salu, r_salu_rp,r_salu + r_salu_r_wp)
   |(q_smac, nothing,  r_smac + r_smac_wp) 
  ,nothing* 3") 

Depending on the occupied FUs one of the following assembly instructions are 
generated: move_salu or move_smac.
 
To find out which FUs were occupied I defined 2 query_cpu_unit-s:
(define_query_cpu_unit "q_salu"  "aut1")
(define_query_cpu_unit "q_smac"  "aut1")

,which further on (during the sched2 hook TARGET_SCHED_DFA_NEW_CYCLE) are used 
to find out which of the 2 alternative reservations the automaton has taken. 
Based on this decision I add some information within the insn rtx which later 
on is used to dump "move_salu" or "move_smac".

This approach was successful as long as the units claimed by "si_move" belong 
to one and the same automaton ( i.e., q_salu, q_smac, r_salu_rp, r_salu, 
r_salu_r_wp, r_smac, r_smac_wp belong to "aut1").

If I try to split "aut1" into (lets say 2) automatons I get generated incorrect 
code (i.e., the choice of choosing the proper unit is not correct anymore.) 

QUESTION: Is there such constraint that the units part of the alternatives of a 
reservation must belong to the same automaton?

Re: query automaton

2009-02-27 Thread Alex Turjan

Dear Vladimir, 
> Not really.  There is no requirement for "the units
> part of the alternatives of a reservation must belong to the
> same automaton".  Querying should also work in this
> case because function cpu_unit_reservation_p checks all
> automata for an unit reservation. 
Indeed it checks all automata but Im afraid that according to my pipeline 
description this check is not enough to guarantee a correct scheduling 
decision, e.g.,
suppose the following insn reservation: 

(define_reservation "move"  "(  (unit1_aut1, unit1_aut2) |  (*)
(unit2_aut1, unit2_aut2)  ) 

,where unitN_autM refers to unit N from automata M. In this case there are 2 
automata. 
Now supose a scheduling state S made of the individual states of the two 
automatons S=. According to what I see happening in 
insn-automata.c (and target.dfa), from S_aut1 there is a transition for 
unit1_aut1 and from S_aut2 there is a transition for unit2_aut2. 

It seems that the automata do not communicate with each other. As a 
consequence, 
A scheduling decision which results in the resource reservation  
(unit1_aut1, unit2_aut2)  would not be rejected, while it should. 
 
In my opinion, the current implementation sees the reservation defined in (*)
As equivalent to the following one
(define_reservation "move"  "(  (unit1_aut1| unit2_aut1) ,
(unit1_aut2| unit2_aut2)  ) 
Which does not seem  true to me. 

Is there a way for automatons to communicate so that the alternative 
(unit1_aut1, unit2_aut2) would be rejected?

regards,
Alex

Re: query automaton

2009-03-18 Thread Alex Turjan


Vladimir thanks for help!

With respect to your answer I would like to point to you a possible way to 
solve the scheduling problem 
(which I think wont be too difficult to be implemented in genautomata). 

To explain it, I will make use of the “move insn” example pointed in the 
previous email. 
First of all I do not change anything in the 2 automatons (aut1 and aut2); they 
are correctly built by the genautomata. 
Next to the original 
> (define_reservation "move"  "(  (unit1_aut1, unit1_aut2) |  
>(unit2_aut1, unit2_aut2)  
> ) 

I define two separate fake move instructions (i.e., “move_fake1” and 
“move_fake2”) of which semantic does play any role; each of them
having a unit reservation one of the alternatives of the “move”:

(define_reservation "move_fake1"  "( (unit1_aut1, unit1_aut2)), and 
(define_reservation "move_fake2"  "( (unit2_aut1, unit2_aut2) ).

Those two extra moves are then used during the sched2 target hook 
TARGET_SCHED_DFA_NEW_CYCLE to decide which of the move alternative unit 
reservations can be correctly claimed.
Suppose the ready instruction is a “move”.
First I make 2 copies of the current state: temp_state1 and temp_state2.
Then I check which of the two fake moves ( “move_fake1” and “move_fake2”) with 
different unit occupation can be issued at the current state:
cost_move_fake1 = internal_state_transition (insn_code_move_fake1, temp_state1);
cost_move_fake2 = internal_state_transition (insn_code_move_fake2, temp_state2);

Once I find the correct choice (suppose cost_move_fake1 <0),  I set 
dfa_insn_codes[] of the ready instruction to the one of  move_fake1.

  int uid = INSN_UID (ready);
 if (uid >= dfa_insn_codes_length)
dfa_insn_code_enlarge (uid);   
dfa_insn_codes[uid] = internal_dfa_insn_code (insn_code_move_fake1);

In this way when the scheduler calls internal_state_transition, the ready 
instruction has been already set its dfa_insn_codes to a unit reservation 
schedulable in the current state.

regards,
Alex



--- On Tue, 3/3/09, Vladimir Makarov  wrote:

> From: Vladimir Makarov 
> Subject: Re: query automaton
> To: atur...@yahoo.com
> Cc: gcc@gcc.gnu.org
> Date: Tuesday, March 3, 2009, 4:35 PM
> Alex Turjan wrote:
> > Dear Vladimir,   
> >> Not really.  There is no requirement for "the
> units
> >> part of the alternatives of a reservation must
> belong to the
> >> same automaton".  Querying should also work
> in this
> >> case because function cpu_unit_reservation_p
> checks all
> >> automata for an unit reservation. 
> > Indeed it checks all automata but Im afraid that
> according to my pipeline description this check is not
> enough to guarantee a correct scheduling decision, e.g.,
> > suppose the following insn reservation: 
> > (define_reservation "move"  "( 
> (unit1_aut1, unit1_aut2) |  (*)
> > (unit2_aut1,
> unit2_aut2)  ) 
> > ,where unitN_autM refers to unit N from automata M. In
> this case there are 2 automata. Now supose a scheduling
> state S made of the individual states of the two automatons
> S=. According to what I see happening
> in insn-automata.c (and target.dfa), from S_aut1 there is a
> transition for unit1_aut1 and from S_aut2 there is a
> transition for unit2_aut2. 
> > It seems that the automata do not communicate with
> each other. As a consequence, A scheduling decision which
> results in the resource reservation  (unit1_aut1,
> unit2_aut2)  would not be rejected, while it should.  In my
> opinion, the current implementation sees the reservation
> defined in (*)
> > As equivalent to the following one
> > (define_reservation "move"  "( 
> (unit1_aut1| unit2_aut1) ,
> > (unit1_aut2|
> unit2_aut2)  ) Which does not seem  true to me. 
> > Is there a way for automatons to communicate so that
> the alternative (unit1_aut1, unit2_aut2) would be rejected?
> > 
> >   
> Last two days, I've been working on this issue.  I
> found that you are right, genautomata permits to generate
> incorrect automata although I did not find that it is a case
> for major current description which I checked.
> 
> I found the issue is complicated.  Genuatomata has already
> a check for correct automata generations in
> check_regexp_units_distribution but that is not enough.  I
> am working on formulation of general rules for correct
> automata generation and implementation of their check.  I
> think it will take a week or two to do this and check how
> does it work for all current automata description.
> 
> Alex, thanks for pointing out this issue to me.

Re: [gSoc] [graphite] general plan for Automatic parallelization in Graphite

2009-04-22 Thread Alex Turjan


Are there any plans to move the partial unrolling phase from
RTL to Tree-SSA?
The move would benefit from better (or easier to implement) Tree-SSA alias 
analysis. 

Alex
> 
> 
> 
> 
> --- On Wed, 4/22/09, Li Feng 
> wrote:
> 
> > From: Li Feng 
> > Subject: [gSoc] [graphite] general plan for Automatic
> parallelization in   Graphite
> > To: "GCC" 
> > Cc: "Tobias Grosser"
> , "Sebastian"
> , "Razya Ladelsky"
> , konrad.trifuno...@gmail.com
> > Date: Wednesday, April 22, 2009, 5:10 PM
> > Hi,
> > 
> > It's nice that the proposal 'Automatic
> > parallelization in Graphite'
> > is accepted. Which means I will be working with great
> > Graphtie
> > developers this summer, and trying to implement the
> project
> > .
> > 
> > I have set up a blog for this project, which will
> mainly
> > about this
> > project: 1. plans 2. what I have done 3. related
> Graphite
> > internals
> > You can subscribe to it if you like:
> > http://summergraphite.blogspot.com/
> > 
> > Here is a general plan for this project, keep you in
> loop,
> > and feel free to comment :)
> > 
> >  1. Mark the innermost loop parallel  [done]
> > 
> >  2. Try to schedule autopar pass after Graphite, and
> enable
> >  code generation if flag_graphite_force_parallel
> is set
> >  - There should be some discussion with Razya
> about
> >her plan about the autopar part
> >  - But before that, I'll try to schedule
> > autopar first
> > 
> >  3. I may try to write testcases for the loops that
> should
> > be
> >  parallel, from simple to hard, and check
> autopar's
> > code
> >  generation part, make sure this works correctly
> as we
> >  expected.
> >  - The testcases is important. There should be
> some
> >detailed discussion maybe with Sebastian
> and
> > Konrad.
> >To see what kind of loop we can/decide to
> > handle.
> >  - Check autopar's code generation with
> >flag_graphite_force_parallel set with these
> > testcases,
> >report bugs if it goes wrong.
> > 
> >  4. Try to write code for deciding if a loop can be
> > parallel
> >  with data dependency test under this polyhedral
> model.
> > - Try to understand the interface of data
> > dependency test
> > - Write code, if data dependency success, mark
> the
> > loop parallel
> > 
> > Cheers,
> > Li

Re: Fwd: gcc instruction scheduling makes things worse?

2009-05-06 Thread Alex Turjan


> For data dependency cases,  I do some jobs in
> the adjust_cost target hook.
Normally the scheduling takes into account the instruction latencies which you 
have specified: for load 2, for mul 4 and for alu 1.
Why do you need to adjust the sched costs? Did you try simply without adjusting 
the costs?

> I  read  THE GNU INSTRUCTION SCHEDULER written by 
> Michael D. Tiemann,
This is old work. Meanwhile there is a new scheduler based on finite state 
machines. You can read in "The finite state automaton based pipeline hazard 
recognizer and instruction scheduler in GCC" by Vladimir Makarov.

scheduling question

2009-05-06 Thread Alex Turjan


Hi,
During scheduling Im confronted with the fact that an instruction is moved
from the ready list to queued with the cost 2, while according to my
expectations the insn should have been moved to queued with cost 1.

Did anybody experience similar problem? 

In case an insn is ready but can bot be schedled in the current
cycle, is it correct (i.e. the generated code is correct) to move the insn
to the queue list with cost 1 ?; no matter what it the value >=1
returned by state_transition.

It seams to me that moving from the ready to queue list with cost >=1 is 
an optimization for compilation time.

Re: scheduling question

2009-05-07 Thread Alex Turjan


--- On Thu, 5/7/09, Maxim Kuvyrkov  wrote:

> From: Maxim Kuvyrkov 
> Subject: Re: scheduling question
> To: atur...@yahoo.com
> Cc: "Vladimir Makarov" , gcc@gcc.gnu.org
> Date: Thursday, May 7, 2009, 1:01 PM
> Alex Turjan wrote:
> > Hi,
> > During scheduling Im confronted with the fact that an
> instruction is moved
> > from the ready list to queued with the cost 2, while
> according to my
> > expectations the insn should have been moved to queued
> with cost 1.
> > 
> > Did anybody experience similar problem? 
> 
> From what you described it's not clear what the problem
> is.  When scheduler infers that an instruction cannot be
> scheduled in the next N cycles (due to DFA hazard or
> insn_cost/dep_cost hook considerations or due to something
> else) the scheduler queues the instruction on (N+1) cycle.

The point is that in found a case in which an insn is moved to queued with cost 
2 while it could have been issued in next cycle (which corresponds to queued 
with cost 1). 

I think the problem is not in the automaton state transition but in the 
vector_min_issue_delay. Do you have some documentation of the way 
vector_min_issue_delay is constructed?

> Correct, scheduler would be working unnecessarily long
> otherwise.
Maxim thanks a lot for your answer.

Re: Illegal schedule

2010-06-22 Thread Alex Turjan

>  Do I have to reorganize the code prior to slot filling? Do
> I have to make sure that some problematic instructions do
> not appear in slots?

Perhaps a easy way to solve the problem would be to claim for branches a memory 
port a number of stages before and after the IF; to avoid in this way hazards 
in the delay slots.
Alternatively you can adjust the gcc delay slot filler (reorg.c) for your 
architecture.

reload question

2010-06-23 Thread Alex Turjan

Hi,
My port supports hardware loops generated by the following (do_end) pattern: 

(set (pc) (if_then_else (ne (match_operand:HI 0 "general_register_operand" "d")
   (const_int 0))
   (label_ref (match_operand 1 "" ""))
   (pc)))
 (set  (match_operand:HI 2 "general_register_operand" "=0")
 (plus:HI (match_operand:HI 3 "general_register_operand" "0")
  (const_int -1)))
(clobber (match_operand:BI 4 "predicate_register_operand" "=j"))


When Im compiling a loop with high register pressure the register allocator 
does not allocate a register for the loop counter (i.e., operand 0) as it has a 
long life range. Thus operand 0 has to be reloaded but then I get a failure in 
the reload:

error: unable to generate reloads for:
(jump_insn 211 182 188 11 lib5.c:51 (parallel [
(set (pc)
(if_then_else (ne (reg:HI 292 [ N ])
(const_int 0 [0x0]))
(label_ref 183)
(pc)))
(set (reg:HI 292 [ N ])
(plus:HI (reg:HI 292 [ N ])
(const_int -1 [0x])))
(clobber (reg:BI 33 p1 [293]))
]) 506 {do_endhi} (expr_list:REG_UNUSED (reg:BI 33 p1 [293])
(expr_list:REG_BR_PROB (const_int 9100 [0x238c])
(nil)))
 -> 183)
lib5.c:105:1: internal compiler error: in find_reloads, at reload.c:3821

Can anybody give me a hint?
I am aware of the following msg: 
http://gcc.gnu.org/ml/gcc/2001-09/msg00942.html
but still dont know how to make reload do the work.

thanks,
Alex

Re: reload question

2010-06-23 Thread Alex Turjan


> insns which branch are not allowed to have output
> reloads.  You must 
> support any kind of register as well as memory operands in
> your insn for 
> the loop counter.

Thanks for answer but what do you suggest to do, as my architecture done not 
support HW loops with memory operands?

Alex

CFG traversal

2010-07-06 Thread Alex Turjan

Hi,
Is there functionality in gcc based on which the CFG can be traversed in such a 
way that a node gets visited once all of its predecessors have been visited?

thanks,
Alex

secondary reload via 2 intermediary registers

2010-09-30 Thread Alex Turjan

Hi,
I have 3 questions regarding secondary reload:

1.Is it possible to do the secondary reload via 2 intermediary registers? 
As far as I can see the insn that implements the secondary reload has to have 3 
operands.

2. Is it possible that an instruction emitted during the secondary reload to 
get reloaded as well? and if yes how?

thanks in advance,
Alex

Re: secondary reload via 2 intermediary registers

2010-10-19 Thread Alex Turjan

Hi Jeff,
Thanks for answer. I managed to make use of an architecture trick which allows 
me to get the secondary reload via only one intermediary register, but still 
have some comments to what you wrote me.

> > 1.Is it possible to do the secondary reload via 2
> intermediary registers?
> > As far as I can see the insn that implements the
> secondary reload has to have 3 operands.
> Make the scratch/intermediary register double-sized so that
> you get a pair of registers instead of a single register for
> the scratch/intermediary.

In my case the intermediary registers were from two different register files 
holding data of different modes. To do what you suggest I would have had to 
interleave the two register files and introduce a new mode that occupies a pair 
of regs - one from each of the register files. 

> > 2. Is it possible that an instruction emitted during
> the secondary reload to get reloaded as well? and if yes
> how?
> Typically you need to ensure that your reload_xxx expanders
> generate RTL which does not need further reloading. 
> This makes handling secondary reloads rather complex in some
> cases.
I was asking for this because I was not able to do the secondary reload via 2 
intermediary registers. Thus I was trying to provide the two registers - the 
first one via a first secondary reload which was generating two instructions 
among which one was needing again a secondary reload where I was planing to 
provide the second register. However the mechanism was not functioning.

unschedule insn functionality needed

2010-11-12 Thread Alex Turjan

Hi,
Can anybody give me a hint on where (perhaps some branch) I can find 
functionality 
which allows during scheduling to un-schedule an instruction? I basically need 
a function that does the oposite of schedule_insn. 

During scheduling I want o schedule_insn(INSN), then check the ready_list and 
then (in case)
unschedule INSN. 

thanks,
Alex

Re: pipeline description

2010-11-12 Thread Alex Turjan

Alexander is right. Perhaps you can implement the TARGET_SCHED_ADJUST_COST
, then catch in the debugger the two instructions that you expect to be 
scheduled together and see what the default latency is or if needed you 
may just adjust it to the proper value.

Alex


--- On Fri, 11/12/10, Alexander Monakov  wrote:

> From: Alexander Monakov 
> Subject: Re: pipeline description
> To: "Ian Lance Taylor" 
> Cc: "roy rosen" , gcc@gcc.gnu.org
> Date: Friday, November 12, 2010, 1:22 PM
> 
> 
> On Thu, 11 Nov 2010, Ian Lance Taylor wrote:
> 
> > roy rosen 
> writes:
> > 
> > > If I have two insns:
> > > r2 = r3
> > > r3 = r4
> > > It seems to me that the dependency analysis
> creates a dependency
> > > between the two and prevent parallelization.
> Although there is a
> > > dependency (because of r3) I want GCC to
> parallelize them together.
> > > Since if the insns are processed together the old
> value of r3 is used
> > > for the first insn before it is written by the
> second insn.
> > > How do I let GCC know about these things (When
> exactly each operand is
> > > used and when it is written)?
> > > Is it in these hooks?
> > > In which port can I see a good example for that?
> > 
> > I was under the impression that an anti-dependence in
> the same cycle was
> > permitted to execute.  But perhaps I am
> mistaken.
> 
> No, you are right.  That is the norm when compiling
> for ia64, for example.  I
> don't think the backend should specifically care about it:
> the anti-dependency
> gets zero latency, and the scheduler is able to issue the
> second instruction
> on the same cycle after issuing the first.
> 
> Alexander
>

selective scheduler failure

2012-07-17 Thread Alex Turjan

Hi,
I get an assert failure during a run of the selective scheduler.
The problem is due to the implementation of maybe_tidy_empty_bb.
Inside of it I see the following piece of code:

/* If it is possible - merge BB with its predecessor. */
if (can_merge_blocks_p (bb->prev_bb, bb))
sel_merge_blocks (bb->prev_bb, bb);


During my compiler run I reach the situation in which bb->prev_bb is not in the
scheduling scope BUT can_merge_blocks_p (bb->prev_bb, bb) returns true.

As a consequence inside sel_remove_empty_bb I hit on the following assert:
gcc_assert (in_current_region_p (merge_bb));

The backtrace from the assert is: 
#0 move_bb_info 
#1 sel_remove_empty_bb
#2 sel_merge_blocks
#3 maybe_tidy_empty_bb 

Perhaps one way to solve the problem is to replace:
if (can_merge_blocks_p (bb->prev_bb, bb))
with
if (can_merge_blocks_p (bb->prev_bb, bb) && in_current_region_p (bb->prev_bb))

I added this description to buggzila under Bug 53999.

regards,
Alex

Re: selective scheduler failure

2012-07-17 Thread Alex Turjan

Hi,
I tried the patch but it doesnt solve my problem.
The patch addresses the problem on the else branch but my problem is on the if:
if (can_merge_blocks_p (bb->prev_bb, bb))
sel_merge_blocks ...

Alex

--- On Tue, 7/17/12, Alexander Monakov  wrote:

> From: Alexander Monakov 
> Subject: Re: selective scheduler failure
> To: "Alex Turjan" 
> Cc: gcc@gcc.gnu.org
> Date: Tuesday, July 17, 2012, 7:07 PM
> > As a consequence inside
> sel_remove_empty_bb I hit on the following assert:
> > gcc_assert (in_current_region_p (merge_bb));
> 
> Sounds like your GCC tree does not have Andrey's fix for PR
> 52250 (SVN
> revision 184975).
> 
> Alexander
>

Re: selective scheduler failure

2012-07-17 Thread Alex Turjan

I found the patch from the following link:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52250
But i dont see the removal of the assert in move_bb_info.
Perhaps you have another patch related to this bug.
thanks
Alex

--- On Tue, 7/17/12, Alexander Monakov  wrote:

> From: Alexander Monakov 
> Subject: Re: selective scheduler failure
> To: "Alex Turjan" 
> Cc: gcc@gcc.gnu.org
> Date: Tuesday, July 17, 2012, 7:57 PM
> On Tue, Jul 17, 2012 at 8:39 PM, Alex
> Turjan 
> wrote:
> >
> > Hi,
> > I tried the patch but it doesnt solve my problem.
> > The patch addresses the problem on the else branch but
> my problem is on the if:
> > if (can_merge_blocks_p (bb->prev_bb, bb))
> >     sel_merge_blocks ...
> 
> I don't understand. You said earlier that the assert in
> move_bb_info
> failed, but the patch removes that assert. So how does the
> failure
> look now? Can you provide a reproducible testcase?
> 
> Alexander
>

Re: selective scheduler failure

2012-07-18 Thread Alex Turjan

I applied the new patch and it seams fine. 
Thanks!
Alex 


--- On Wed, 7/18/12, Alexander Monakov  wrote:

> From: Alexander Monakov 
> Subject: Re: selective scheduler failure
> To: "Alex Turjan" 
> Cc: "Alexander Monakov" , gcc@gcc.gnu.org
> Date: Wednesday, July 18, 2012, 1:42 AM
> On Wed, Jul 18, 2012 at 1:05 AM, Alex
> Turjan 
> wrote:
> > I found the patch from the following link:
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52250
> 
> See comment 9. The patch pasted in the audit trail is not
> what has
> been committed.
> 
> Alexander
>

SMS issues

2012-07-18 Thread Alex Turjan

Im writing to you with respect to some strange SMS functionality.
In the code bellow there are 2 instructions (a builtin store and a builtin 
load) 
as they appear in the program flow before SMS:

(insn 134 133 135 12 tdscdma_pfu_ccdec.c:289 
(set (mem:HI (plus:PSI (reg/v/f:PSI 185 [ ccdecInterim_pi16 ])
(reg:SI 303)) [0 S2 A16])
(unspec:HI [
(reg/v:HI 244 [ outData_u16 ])
] 1752)) 1377 {INSN_BUILTIN___storebyteofs_16} (expr_list:REG_DEAD 
(reg:SI 303)
(expr_list:REG_DEAD (reg/v:HI 244 [ outData_u16 ])
(nil

(insn 136 135 137 12 tdscdma_pfu_ccdec.c:292 
(set (reg/v:HI 248 [ mappingAddress_i16 ])
(unspec:HI [
(mem:HI (plus:PSI (reg/v/f:PSI 170 [ curMappingTable_pi16 ])
(reg:SI 305)) [0 S2 A16])
] 696)) 755 {INSN_BUILTIN___loadbyteofs_16} (expr_list:REG_DEAD 
(reg:SI 305)
(nil)))

I observed that in ddg.c there are 3 dependencies build between insn 134 and 
136:
1. (T,0) - a true dep of distance 0.
2. (T,1) - a true dep of distance 1.
3. (A,1) - an anti dep of distance 1.

Due to (T,1) dependence, further on during the calling of generate_reg_moves:

1. there is created a (modulo expansion) move instruction
  (insn 207 132 134 13 (set (reg:HI 326)
(mem:HI (plus:PSI (reg/v/f:PSI 185 [ ccdecInterim_pi16 ])
(reg:SI 303)) [0 S2 A16])) -1 (nil))

2. while generate_reg_moves attempts to replace in insn 136 the memory access 
 (mem:HI (plus:PSI (reg/v/f:PSI 170 [ curMappingTable_pi16 ])
(reg:SI 305)) [0 S2 A16])

with the register (reg:HI 326) (the same as the one set by insn 207).
But the replacement fails so insn 136 remains unchanged.
Remark that later on insn 207 gets removed (during ira) 

Issues:
1.  What is the reason why (T,1) is build up? – to me it seams that (T,0) 
must be enough
2.  Looking inside generate_reg_moves it seams to me that this function 
is not meant to deal with replacing memory accesses but only with 
register replacements. (see inside the call to replace_rtx which in 
my case trys to replace the  mem accesses inside 136).

3. The (T,1) dep is assumed to take place as if before the SMS pass, 
insn 136 was preceding insn 134:

(insn 136 135 137 12 tdscdma_pfu_ccdec.c:292 
(set (reg/v:HI 248 [ mappingAddress_i16 ])
(unspec:HI [
(mem:HI (plus:PSI (reg/v/f:PSI 170 [ curMappingTable_pi16 ])
(reg:SI 305)) [0 S2 A16])
] 696)) 755 {INSN_BUILTIN___loadbyteofs_16} (expr_list:REG_DEAD 
(reg:SI 305)
(nil)))

(insn 134 133 135 12 tdscdma_pfu_ccdec.c:289 
(set (mem:HI (plus:PSI (reg/v/f:PSI 185 [ ccdecInterim_pi16 ])
(reg:SI 303)) [0 S2 A16])
(unspec:HI [
(reg/v:HI 244 [ outData_u16 ])
] 1752)) 1377 {INSN_BUILTIN___storebyteofs_16} (expr_list:REG_DEAD 
(reg:SI 303)
(expr_list:REG_DEAD (reg/v:HI 244 [ outData_u16 ])
  (nil

If that would be the case then between 134 and 136 there would be present 
also an antidependence of distance 0. Becasue in the pipelined schedule, 
134 is scheduled before 136 (SCHED_TIME (136) > SCHED_TIME (134)) the modulo 
variable expansion needs to take place as explained before. 
 
SMS decides to produce a modulo variable expansion in a case when is not 
needed. However, it fails in fulfilling the whole modulo variable expansion
procedure, covering in this way the possibly incorrect behavior described above.

regards,
Alex

Re: SMS issues

2012-07-19 Thread Alex Turjan

Andrey,
Thanks for the patch. I applied it and so far it seams ok.
I will run further testing and let you know if i see problems.

Back to the last part of my email, Im still wondering what happens in case
the variable modulo expanded is a memory location? because as I see 
generate_reg_moves is not able to handle such situation... or perhaps there is 
something which prevents the modulo scheduler from arriving to this situation?

Alex

--- On Thu, 7/19/12, Andrey Belevantsev  wrote:

> From: Andrey Belevantsev 
> Subject: Re: SMS issues
> To: "Alex Turjan" 
> Cc: gcc@gcc.gnu.org, ayal.z...@gmail.com, revital.e...@linaro.org, "Roman 
> Zhuikov" 
> Date: Thursday, July 19, 2012, 11:11 AM
> Hello Alex,
> 
> On 18.07.2012 18:40, Alex Turjan wrote:
> >
> > Im writing to you with respect to some strange SMS
> functionality.
> > In the code bellow there are 2 instructions (a builtin
> store and a builtin load)
> > as they appear in the program flow before SMS:
> 
> ...
> 
> > Issues:
> > 1.    What is the reason why (T,1) is
> build up? – to me it seams that (T,0)
> >     must be enough
> 
> This looks like the issue that Roman's patch from 
> http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html
> should be fixing, 
> could you try it?
> 
> Ayal, Revital, could you again take a look at the above
> patch and all the 
> SMS improvement patches mentioned in 
> http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01859.html? 
> The last comments 
> from me are at http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00478.html. 
> At 
> the Cauldron, I was talking to Ramana about pushing these
> forward as 
> important for arm and Linaro, so it would be good to have
> them in 4.8.
> 
> Andrey
> 
> > 2.    Looking inside generate_reg_moves
> it seams to me that this function
> >     is not meant to deal with replacing
> memory accesses but only with
> >     register replacements. (see inside
> the call to replace_rtx which in
> >     my case trys to replace the 
> mem accesses inside 136).
> >
> > 3. The (T,1) dep is assumed to take place as if before
> the SMS pass,
> > insn 136 was preceding insn 134:
> >
> > (insn 136 135 137 12 tdscdma_pfu_ccdec.c:292
> >     (set (reg/v:HI 248 [
> mappingAddress_i16 ])
> >          (unspec:HI [
> >               
>   (mem:HI (plus:PSI (reg/v/f:PSI 170 [
> curMappingTable_pi16 ])
> >               
>           (reg:SI 305)) [0 S2
> A16])
> >              ] 696))
> 755 {INSN_BUILTIN___loadbyteofs_16} (expr_list:REG_DEAD
> (reg:SI 305)
> >          (nil)))
> >
> > (insn 134 133 135 12 tdscdma_pfu_ccdec.c:289
> >     (set (mem:HI (plus:PSI (reg/v/f:PSI
> 185 [ ccdecInterim_pi16 ])
> >               
>   (reg:SI 303)) [0 S2 A16])
> >          (unspec:HI [
> >               
>   (reg/v:HI 244 [ outData_u16 ])
> >              ]
> 1752)) 1377 {INSN_BUILTIN___storebyteofs_16}
> (expr_list:REG_DEAD (reg:SI 303)
> >          (expr_list:REG_DEAD
> (reg/v:HI 244 [ outData_u16 ])
> >        (nil
> >
> > If that would be the case then between 134 and 136
> there would be present
> > also an antidependence of distance 0. Becasue in the
> pipelined schedule,
> > 134 is scheduled before 136 (SCHED_TIME (136) >
> SCHED_TIME (134)) the modulo
> > variable expansion needs to take place as explained
> before.
> >
> > SMS decides to produce a modulo variable expansion in a
> case when is not
> > needed. However, it fails in fulfilling the whole
> modulo variable expansion
> > procedure, covering in this way the possibly incorrect
> behavior described above.
> >
> > regards,
> > Alex
> >
> 
>

RTL alias analysis

2008-04-01 Thread Alex Turjan

Hi everybody,
Im interested whether there is gcc support (at the RTL
level) after unrolling which allows to disambiguate
memory accesses present in RTL constructs (specially
among memory accesses generated due to unrolling) for
machines that do not support indexed addressing modes.
 

I am aware of an article by Sanjiv Gupta and Naveen
Sharma on alias analysis at the RTL level, which was
published in the GCC 2003 summit. Are there any
further gcc developments of this work or is there a
patch concerning this work? 

thanks in advance,
Alex Turjan



  

No Cost - Get a month of Blockbuster Total Access now. Sweet deal for Yahoo! 
users and friends. 
http://tc.deals.yahoo.com/tc/blockbuster/text1.com

RE: Implement #pragma unroll?

2008-05-29 Thread Alex Turjan

Dear Bingfeng,
Some time ago I had to deal with a similar issue as
you. Basically I did as follows: I built a backend
function which catches the unroll pragma and replaces
it with a target assembly intrinsic (which of course
has to be described in an .md file). After that in the
RTL unroll phase, I analyze loop by loop and connect
them to the corresponding unrolling intrinsic (which
as a field contains the unrolling factor, you may add
here extra information which allows you to recognize
the loop) from where I decide the unrolling factor.
After that in the RTL unroll phase I remove all the
unroll intrinsics.
hope this will help,
Alex

RE: Implement #pragma unroll?

2008-05-30 Thread Alex Turjan

Bingfeng,
to define the pragma you use something like this:
#define REGISTER_TARGET_PRAGMAS() 
  
{ 
   
  c_register_pragma (0, "unroll_factor",
support_unroll_pragma); 
}

such that when you compile a loop like this you get it
unrolled 20 times:
#pragma unroll_factor=20
for(i = 0; i < 40; i++){
 m[i]=3;
}

then you have to implement the support_unroll_pragma
function
which in my case I used it to extract the unrolling
factor (in this case 20)
and to insert in the parse tree an intrinsic. Here is
my code:

void
support_unroll_pragma (cpp_reader * pfile
ATTRIBUTE_UNUSED)
{
  tree x;
  enum cpp_ttype token;
  int unroll_factor = 0;

  token = pragma_lex (&x);
  gcc_assert (token == CPP_EQ);
  token = pragma_lex (&x);
  gcc_assert (token == CPP_NUMBER);
  unroll_factor = TREE_INT_CST_LOW (x);

  tree type =
build_function_type_list (void_type_node,
integer_type_node, NULL_TREE);
  tree t = build_fn_decl ("__unroll_pragma", type);
  enum built_in_function code = DECL_FUNCTION_CODE
(t);
  DECL_BUILT_IN_CLASS (t) = BUILT_IN_MD;
  TREE_PUBLIC (t) = 1;
  DECL_EXTERNAL (t) = 1;
  DECL_FUNCTION_CODE (t) =
builtin_line_no("__unroll_pragma");
  tree operand = build_int_cst (NULL_TREE,
unroll_factor);
  t = build_call_expr (t, 1, operand);

  add_stmt (t);
}

trimedia_builtin_line_no generates a call described by
the following pattern:
(define_insn "customop_unroll_pragma"
[(unspec_volatile:SI
[
(match_operand:SI 0
"immediate_operand" "i")
] UNSPEC_unroll_pragma)
]
""
""
)

which in loop-unroll (where you have RTL instructions)
is caught with the 
with something like this:
(INSN_P(insn) && recog_memoized(insn) ==
CODE_FOR_customop_unroll_pragma

more or less this is it.
Alex



--- Bingfeng Mei <[EMAIL PROTECTED]> wrote:

> Alex, Thanks for your suggestion. What target hook
> do you use for the
> backend function? 
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Alex Turjan
> Sent: 29 May 2008 14:45
> To: gcc@gcc.gnu.org
> Subject: RE: Implement #pragma unroll?
> 
> Dear Bingfeng,
> Some time ago I had to deal with a similar issue as
> you. Basically I did as follows: I built a backend
> function which catches the unroll pragma and
> replaces
> it with a target assembly intrinsic (which of course
> has to be described in an .md file). After that in
> the
> RTL unroll phase, I analyze loop by loop and connect
> them to the corresponding unrolling intrinsic (which
> as a field contains the unrolling factor, you may
> add
> here extra information which allows you to recognize
> the loop) from where I decide the unrolling factor.
> After that in the RTL unroll phase I remove all the
> unroll intrinsics.
> hope this will help,
> Alex 
> 
> 
> 
>   
> 
> 
>

Re: Scheduling automaton question

Re: Question about the difference between two instruction scheduling passes

Re: IRA undoing scheduling decisions

question about DSE

Re: question about DSE

bug linear loop transforms

generate assembly mnemonic depending the resource allocation

RE: generate assembly mnemonic depending the resource allocation

Re: generate assembly mnemonic depending the resource allocation

Re: generate assembly mnemonic depending the resource allocation

query automaton

Re: query automaton

Re: query automaton

Re: [gSoc] [graphite] general plan for Automatic parallelization in Graphite

Re: Fwd: gcc instruction scheduling makes things worse?

scheduling question

Re: scheduling question

Re: Illegal schedule

reload question

Re: reload question

CFG traversal

secondary reload via 2 intermediary registers

Re: secondary reload via 2 intermediary registers

unschedule insn functionality needed

Re: pipeline description

selective scheduler failure

Re: selective scheduler failure

Re: selective scheduler failure

Re: selective scheduler failure

SMS issues

Re: SMS issues

RTL alias analysis

RE: Implement #pragma unroll?

RE: Implement #pragma unroll?

34 matches

Site Navigation

Mail list logo

Footer information