Re: Pipeline hazards and delay slots

2008-05-05 Thread Mohamed Shafi
On Mon, May 5, 2008 at 12:23 PM, Pranav Bhandarkar
<[EMAIL PROTECTED]> wrote:
> Hi Mohammed,
>
>
>  >  But how can i handle instances like this? Should i be doing insertion
>  >  of nops in reorg pass?
>
>  FWIW, I had worked on a port for VLIW processor about three years back
>  and IIRC we had used the reorg pass for inserting the nops.  I think
>  if you look at the scheduler dumps  you will notice that the scheduler
>  would have, in all likelihood, accounted for the delay of 1 cycle
>  between the "lw" and the "add" instructions. Only that you will have
>  to put the "nop" yourself between these two instructions.
>
  Thanks for your reply.
  Thats what i am doing right now. But i am not doing in reorg pass
but in final_prescan. But things go wrong when the two instructions
actually are placed in the delay slots. So after filling the delay
slot the instructions are like this

   call fun ; has two delay slots
   lw R0, R8   ; delay slot 1
   add R1, R0 ; delay slot 2

  Now after final prescan the output is like

   call fun
   lw R0, R8
   nop
   add R1, R0

  This means 3 instruction in delay slots which can have only 2. So my
question was how do i overcome this?
  Should i do delay slot scheduling myself in reorg pass or is there
some other way?

Thanks for your time,
Regards,
Shafi


Re: Pipeline hazards and delay slots

2008-05-05 Thread Boris Boesler

Hi!

Am 05.05.2008 um 09:06 schrieb Mohamed Shafi:

But how can i handle instances like this? Should i be doing  
insertion

of nops in reorg pass?


FWIW, I had worked on a port for VLIW processor about three years  
back

and IIRC we had used the reorg pass for inserting the nops.  I think
if you look at the scheduler dumps  you will notice that the  
scheduler

would have, in all likelihood, accounted for the delay of 1 cycle
between the "lw" and the "add" instructions. Only that you will have
to put the "nop" yourself between these two instructions.


 Thanks for your reply.
 Thats what i am doing right now. But i am not doing in reorg pass
but in final_prescan.


 I don't know about final_prescan but the insn code seems to get  
modified somewhere (more than once?) after the machine dependent reorg  
pass. Ian suggested to reorg the code in TARGET_ASM_FUNCTION_PROLOGUE.  
That's what I am doing. But that has some other disadvantages.




But things go wrong when the two instructions
actually are placed in the delay slots. So after filling the delay
slot the instructions are like this

...

 This means 3 instruction in delay slots which can have only 2. So my
question was how do i overcome this?


 I have similar situations. I simply do not emit these kind of  
instruction in the delay slot! You will have another problem: the lw  
instruction could be the last instruction in the delay slot or at the  
end of a basic block. Then you have to schedule across basic block  
(jump/fall through) and function boundaries (call). The first one can  
be solved by checking code in basic block predecessors given by the  
edges data structure. The second one could be ignored if you know the  
function prologue and can assure that the register is not used in the  
first instruction.


Boris



Re: Pipeline hazards and delay slots

2008-05-05 Thread Mohamed Shafi
On Mon, May 5, 2008 at 1:12 PM, Boris Boesler <[EMAIL PROTECTED]> wrote:
> Hi!
>
>  Am 05.05.2008 um 09:06 schrieb Mohamed Shafi:
>
>
>
> >
> > >
> > > > But how can i handle instances like this? Should i be doing insertion
> > > > of nops in reorg pass?
> > > >
> > >
> > > FWIW, I had worked on a port for VLIW processor about three years back
> > > and IIRC we had used the reorg pass for inserting the nops.  I think
> > > if you look at the scheduler dumps  you will notice that the scheduler
> > > would have, in all likelihood, accounted for the delay of 1 cycle
> > > between the "lw" and the "add" instructions. Only that you will have
> > > to put the "nop" yourself between these two instructions.
> > >
> > >
> >  Thanks for your reply.
> >  Thats what i am doing right now. But i am not doing in reorg pass
> > but in final_prescan.
> >
>
>   I don't know about final_prescan but the insn code seems to get modified
> somewhere (more than once?) after the machine dependent reorg pass. Ian
> suggested to reorg the code in TARGET_ASM_FUNCTION_PROLOGUE. That's what I
> am doing. But that has some other disadvantages.
>

   In final_prescan one actually emits nop instruction in the assembly
file. The disadvantage of this length calculation for indirect jump
goes wrong as compiler will not have any idea emitted nops. So i moved
nop generation to reorg pass.

>
>
>
> > But things go wrong when the two instructions
> > actually are placed in the delay slots. So after filling the delay
> > slot the instructions are like this
> >
>  ...
>
>
> >  This means 3 instruction in delay slots which can have only 2. So my
> > question was how do i overcome this?
> >
>
>   I have similar situations. I simply do not emit these kind of instruction
> in the delay slot!

What i am trying to do now is to examin the delay slots and reschedule
the instructions if required. And thankfully i have gcc back-end 'mt'
also does something similar to this. Hopefully this should slove the
problems.

> You will have another problem: the lw instruction could
> be the last instruction in the delay slot or at the end of a basic block.
> Then you have to schedule across basic block (jump/fall through) and
> function boundaries (call). The first one can be solved by checking code in
> basic block predecessors given by the edges data structure. The second one
> could be ignored if you know the function prologue and can assure that the
> register is not used in the first instruction.

Thanks for the info. Yes i thing i will have to take care of this also.

Thanks for your time.

Regards,
Shafi


4.3.1 Status Report (2008-05-05)

2008-05-05 Thread Joseph S. Myers
Status
==

The GCC 4.3 branch is open for commits under normal release branch
rules.

GCC 4.3.1 was scheduled for 2008-05-05, but will be delayed.  There
are three P1 bugs open that need resolving before 4.3.1-rc1 is
released: a restricted pointers bug (36013), the x86 direction flag
issue (36079) where we don't yet have consensus on whether we need to
have a workaround patch applied, and the ppc64 cacoshl miscompilation
(36090) where possible patches are being discussed.  Ian has applied
the CERT warning fixes to 4.3 branch, so those will be in 4.3.1.

Quality Data


Priority  # Change from Last Report
--- ---
P13 +  1
P2   97 -  3
P34 +  3
--- ---
Total   104 +  1

Previous Report
===

http://gcc.gnu.org/ml/gcc/2008-04/msg00688.html

The next report for the 4.3 branch will be sent by Richard.


-- 
Joseph S. Myers
[EMAIL PROTECTED]


Re: Implementing built-in functions for I/O

2008-05-05 Thread Jim Wilson

Mohamed Shafi wrote:

short k;
__OUT(port no) = k;
So hoe can i do that.


Make __OUT take two parameters.
__OUT(port no, k);

Jim


Re: Common Subexpression Elimination Opportunity not being exploited

2008-05-05 Thread Jim Wilson

Pranav Bhandarkar wrote:

GCC 4.3 does fine here except when the operator is "logical and" (see
attached. test.c uses logical and and test1.c uses plus)


Logical and generates control-flow instructions, i.e. compares, 
branches, and labels.  This makes optimizing it a very different problem 
than optimizing for plus.


Try compiling with -fdump-tree-all and notice that the "gimple" dump 
file already contains all of the control-flow expressed in the IL, which 
means optimizing this is going to be very difficult.


We could perhaps add a new high level gimple that contains the C 
language && as an operator, run a CSE pass, and then later lower it to 
expose the control flow, but that will be a lot of work, and probably 
won't give enough benefit to justify it.


It is simpler to rewrite the code.  For instance if you change this
  a[0] = ione && itwo && ithree && ifour && ifive;
to
  a[0] = !!ione & !!itwo & !!ithree & !!ifour & !!ifive;
then you get the same effect (assuming none of the subexpressions have 
side-effects), and gcc is able to perform the optimization.  You also 
get code without branches which is likely to be faster on modern 
workstation cpus.


Jim


gcc-4.1-20080505 is now available

2008-05-05 Thread gccadmin
Snapshot gcc-4.1-20080505 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20080505/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.1 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch 
revision 134958

You'll find:

gcc-4.1-20080505.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.1-20080505.tar.bz2 C front end and core compiler

gcc-ada-4.1-20080505.tar.bz2  Ada front end and runtime

gcc-fortran-4.1-20080505.tar.bz2  Fortran front end and runtime

gcc-g++-4.1-20080505.tar.bz2  C++ front end and runtime

gcc-java-4.1-20080505.tar.bz2 Java front end and runtime

gcc-objc-4.1-20080505.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.1-20080505.tar.bz2The GCC testsuite

Diffs from 4.1-20080428 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.1
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.