Re: Pipeline hazards and delay slots
On Mon, May 5, 2008 at 12:23 PM, Pranav Bhandarkar <[EMAIL PROTECTED]> wrote: > Hi Mohammed, > > > > But how can i handle instances like this? Should i be doing insertion > > of nops in reorg pass? > > FWIW, I had worked on a port for VLIW processor about three years back > and IIRC we had used the reorg pass for inserting the nops. I think > if you look at the scheduler dumps you will notice that the scheduler > would have, in all likelihood, accounted for the delay of 1 cycle > between the "lw" and the "add" instructions. Only that you will have > to put the "nop" yourself between these two instructions. > Thanks for your reply. Thats what i am doing right now. But i am not doing in reorg pass but in final_prescan. But things go wrong when the two instructions actually are placed in the delay slots. So after filling the delay slot the instructions are like this call fun ; has two delay slots lw R0, R8 ; delay slot 1 add R1, R0 ; delay slot 2 Now after final prescan the output is like call fun lw R0, R8 nop add R1, R0 This means 3 instruction in delay slots which can have only 2. So my question was how do i overcome this? Should i do delay slot scheduling myself in reorg pass or is there some other way? Thanks for your time, Regards, Shafi
Re: Pipeline hazards and delay slots
Hi! Am 05.05.2008 um 09:06 schrieb Mohamed Shafi: But how can i handle instances like this? Should i be doing insertion of nops in reorg pass? FWIW, I had worked on a port for VLIW processor about three years back and IIRC we had used the reorg pass for inserting the nops. I think if you look at the scheduler dumps you will notice that the scheduler would have, in all likelihood, accounted for the delay of 1 cycle between the "lw" and the "add" instructions. Only that you will have to put the "nop" yourself between these two instructions. Thanks for your reply. Thats what i am doing right now. But i am not doing in reorg pass but in final_prescan. I don't know about final_prescan but the insn code seems to get modified somewhere (more than once?) after the machine dependent reorg pass. Ian suggested to reorg the code in TARGET_ASM_FUNCTION_PROLOGUE. That's what I am doing. But that has some other disadvantages. But things go wrong when the two instructions actually are placed in the delay slots. So after filling the delay slot the instructions are like this ... This means 3 instruction in delay slots which can have only 2. So my question was how do i overcome this? I have similar situations. I simply do not emit these kind of instruction in the delay slot! You will have another problem: the lw instruction could be the last instruction in the delay slot or at the end of a basic block. Then you have to schedule across basic block (jump/fall through) and function boundaries (call). The first one can be solved by checking code in basic block predecessors given by the edges data structure. The second one could be ignored if you know the function prologue and can assure that the register is not used in the first instruction. Boris
Re: Pipeline hazards and delay slots
On Mon, May 5, 2008 at 1:12 PM, Boris Boesler <[EMAIL PROTECTED]> wrote: > Hi! > > Am 05.05.2008 um 09:06 schrieb Mohamed Shafi: > > > > > > > > > > > > But how can i handle instances like this? Should i be doing insertion > > > > of nops in reorg pass? > > > > > > > > > > FWIW, I had worked on a port for VLIW processor about three years back > > > and IIRC we had used the reorg pass for inserting the nops. I think > > > if you look at the scheduler dumps you will notice that the scheduler > > > would have, in all likelihood, accounted for the delay of 1 cycle > > > between the "lw" and the "add" instructions. Only that you will have > > > to put the "nop" yourself between these two instructions. > > > > > > > > Thanks for your reply. > > Thats what i am doing right now. But i am not doing in reorg pass > > but in final_prescan. > > > > I don't know about final_prescan but the insn code seems to get modified > somewhere (more than once?) after the machine dependent reorg pass. Ian > suggested to reorg the code in TARGET_ASM_FUNCTION_PROLOGUE. That's what I > am doing. But that has some other disadvantages. > In final_prescan one actually emits nop instruction in the assembly file. The disadvantage of this length calculation for indirect jump goes wrong as compiler will not have any idea emitted nops. So i moved nop generation to reorg pass. > > > > > But things go wrong when the two instructions > > actually are placed in the delay slots. So after filling the delay > > slot the instructions are like this > > > ... > > > > This means 3 instruction in delay slots which can have only 2. So my > > question was how do i overcome this? > > > > I have similar situations. I simply do not emit these kind of instruction > in the delay slot! What i am trying to do now is to examin the delay slots and reschedule the instructions if required. And thankfully i have gcc back-end 'mt' also does something similar to this. Hopefully this should slove the problems. > You will have another problem: the lw instruction could > be the last instruction in the delay slot or at the end of a basic block. > Then you have to schedule across basic block (jump/fall through) and > function boundaries (call). The first one can be solved by checking code in > basic block predecessors given by the edges data structure. The second one > could be ignored if you know the function prologue and can assure that the > register is not used in the first instruction. Thanks for the info. Yes i thing i will have to take care of this also. Thanks for your time. Regards, Shafi
4.3.1 Status Report (2008-05-05)
Status == The GCC 4.3 branch is open for commits under normal release branch rules. GCC 4.3.1 was scheduled for 2008-05-05, but will be delayed. There are three P1 bugs open that need resolving before 4.3.1-rc1 is released: a restricted pointers bug (36013), the x86 direction flag issue (36079) where we don't yet have consensus on whether we need to have a workaround patch applied, and the ppc64 cacoshl miscompilation (36090) where possible patches are being discussed. Ian has applied the CERT warning fixes to 4.3 branch, so those will be in 4.3.1. Quality Data Priority # Change from Last Report --- --- P13 + 1 P2 97 - 3 P34 + 3 --- --- Total 104 + 1 Previous Report === http://gcc.gnu.org/ml/gcc/2008-04/msg00688.html The next report for the 4.3 branch will be sent by Richard. -- Joseph S. Myers [EMAIL PROTECTED]
Re: Implementing built-in functions for I/O
Mohamed Shafi wrote: short k; __OUT(port no) = k; So hoe can i do that. Make __OUT take two parameters. __OUT(port no, k); Jim
Re: Common Subexpression Elimination Opportunity not being exploited
Pranav Bhandarkar wrote: GCC 4.3 does fine here except when the operator is "logical and" (see attached. test.c uses logical and and test1.c uses plus) Logical and generates control-flow instructions, i.e. compares, branches, and labels. This makes optimizing it a very different problem than optimizing for plus. Try compiling with -fdump-tree-all and notice that the "gimple" dump file already contains all of the control-flow expressed in the IL, which means optimizing this is going to be very difficult. We could perhaps add a new high level gimple that contains the C language && as an operator, run a CSE pass, and then later lower it to expose the control flow, but that will be a lot of work, and probably won't give enough benefit to justify it. It is simpler to rewrite the code. For instance if you change this a[0] = ione && itwo && ithree && ifour && ifive; to a[0] = !!ione & !!itwo & !!ithree & !!ifour & !!ifive; then you get the same effect (assuming none of the subexpressions have side-effects), and gcc is able to perform the optimization. You also get code without branches which is likely to be faster on modern workstation cpus. Jim
gcc-4.1-20080505 is now available
Snapshot gcc-4.1-20080505 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20080505/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch revision 134958 You'll find: gcc-4.1-20080505.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20080505.tar.bz2 C front end and core compiler gcc-ada-4.1-20080505.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20080505.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20080505.tar.bz2 C++ front end and runtime gcc-java-4.1-20080505.tar.bz2 Java front end and runtime gcc-objc-4.1-20080505.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20080505.tar.bz2The GCC testsuite Diffs from 4.1-20080428 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.