Re: GCC 4.6 performance regressions
On 9 February 2011 06:20, Tony Poppleton wrote: > > Out of interest, has their been much communication in the past between > GCC and Phoronix to address any of these issues in their previous > benchmarks? I signed up to their forum to point out that snapshots have additional checking turned on by default, but didn't see anyone "official" replying to acknowledge it, only other forums users. I wasn't interested in spending any time trying to reach them.
RTL Expand pass - Difference in float and int datatypes
Hello All, I am trying to understand rtl generation. For the following testcase I have dumped the rtl... (version: gcc-4.3.3, target: AVR32 not upstreamed yet) int main() { volatile int a, b; if(a>b) return 1; else return 2; } Expand pass shows that "le"operator is chosen. (jump_insn 9 8 10 3 gt_int.c:4 (set (pc) (if_then_else (le:CC (cc0) (const_int 0 [0x0])) (label_ref 14) (pc))) -1 (nil)) If I modify the same testcase to use 'float' datatype instead of 'int', "gt" operator is picked up. (jump_insn 9 8 34 3 gt.c:4 (set (pc) (if_then_else (gt:CC (cc0) (const_int 0 [0x0])) (label_ref 12) (pc))) -1 (nil)) I would like to know what prompts gcc to decide if "le" can be used in the expand pass rather than "gt" operator. Or more precisely why it differs incase of float. I am not sure how much of this is target dependent, but the port being used is for avr32 which is not upstream yet. It will be great if I can get some pointers on the above. Regards Anitha
Re: GCC 4.6 performance regressions
On Wed, Feb 9, 2011 at 00:20, Tony Poppleton wrote: >> While I appreciate Phoronix as a booster site, their benchmarking >> practice often seems very dodgy; I'd take the results with a large grain >> of salt > > The main reason I posted the link in the first place was because it > was reflecting my own emperical evidence for the application I am > working on; the march/mtune flags (on various corei* cpus) are > actually detrimental to performance (albeit by at most 10% - but > still, not the performance boost I was hoping for). > > I had been trying for the last few weeks to strip down my application > into a mini-benchmark I could create a PR out of, however it is > tougher than expected and was hoping the Phoronix article was going to > offer a quicker route to finding performance regressions than my code > - as their coverage was a lot wider. Anyway, apparently this is not > the case, so back to my original work... > > Would it not be in the best interests for both GCC and Phoronix to > rectify the problems in their benchmarks? I agree with you here: for many reasons it would be interesting to have Phoronix as a free alternative to the SPEC benchmarks. > Could we not contact the > authors of the article and point out how they can make their > benchmarks better? I would be happy to contact them myself, however I > think it would be far more effective (and valid) coming from a GCC > maintainer. I put Michael in CC. > > Points they have apparently missed so far are; > - clarify which compiler flags they are using Also, the mechanism used to change the flags for a benchmark should be documented and should produce the intended behaviour. For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so building this benchmark with CFLAGS="-O2" would have no effect. Sebastian > - don't run make -j when looking at compile times > - ensure they are using --enable-checking=release when benchmarking a > snapshot > - in general, as much information as possible as to their setup and > usage, to make the results easily repeatable > > Out of interest, has their been much communication in the past between > GCC and Phoronix to address any of these issues in their previous > benchmarks? > > Tony >
Re: GCC 4.6 performance regressions
On 9 February 2011 08:34, Sebastian Pop wrote: > > For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so > building this benchmark with CFLAGS="-O2" would have no effect. Why not? Ignoring the fact -O3 is the highest level for GCC, the manual says: "If you use multiple -O options, with or without level numbers, the last such option is the one that is effective." http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html And CFLAGS="-fno-fast-math -O2" would cancel the effects of -ffast-math too.
Re: GCC 4.6 performance regressions
On Wed, Feb 09, 2011 at 08:42:05AM +, Jonathan Wakely wrote: > On 9 February 2011 08:34, Sebastian Pop wrote: > > > > For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so > > building this benchmark with CFLAGS="-O2" would have no effect. > > Why not? > > Ignoring the fact -O3 is the highest level for GCC, the manual says: > "If you use multiple -O options, with or without level numbers, the > last such option is the one that is effective." > http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > > And CFLAGS="-fno-fast-math -O2" would cancel the effects of -ffast-math too. Last time I've checked e.g. their byte-benchmark uses OPTON = -O ... $(CC) -o $(PROGDIR)/arithoh ${CFLAGS} ${OPTON} -Darithoh $(SRCDIR)/arith.c and similarly for most of the other files, so it really doesn't matter what -O level you provide in CFLAGS there, they are just comparing -O performance. Jakub
Re: GCC 4.6 performance regressions
On Wed, Feb 9, 2011 at 7:20 AM, Tony Poppleton wrote: >> While I appreciate Phoronix as a booster site, their benchmarking >> practice often seems very dodgy; I'd take the results with a large grain >> of salt > > The main reason I posted the link in the first place was because it > was reflecting my own emperical evidence for the application I am > working on; the march/mtune flags (on various corei* cpus) are > actually detrimental to performance (albeit by at most 10% - but > still, not the performance boost I was hoping for). > > I had been trying for the last few weeks to strip down my application > into a mini-benchmark I could create a PR out of, however it is > tougher than expected and was hoping the Phoronix article was going to > offer a quicker route to finding performance regressions than my code > - as their coverage was a lot wider. Anyway, apparently this is not > the case, so back to my original work... > > Would it not be in the best interests for both GCC and Phoronix to > rectify the problems in their benchmarks? Could we not contact the > authors of the article and point out how they can make their > benchmarks better? I would be happy to contact them myself, however I > think it would be far more effective (and valid) coming from a GCC > maintainer. I have done analyses on C-Ray and Himeno last december and filed a few bugzillas (and fixed the easiest ones). Most of the benchmarks they use have the problem that they do not use simple tricks that are available (like build the single-file ones with -fwhole-program). But they probably represent the "bad" coding style that is quite common, so optimizing the codes is probably useful. > Points they have apparently missed so far are; > - clarify which compiler flags they are using > - don't run make -j when looking at compile times > - ensure they are using --enable-checking=release when benchmarking a > snapshot > - in general, as much information as possible as to their setup and > usage, to make the results easily repeatable > > Out of interest, has their been much communication in the past between > GCC and Phoronix to address any of these issues in their previous > benchmarks? Yes. Richard. > Tony >
Re: Broken bootstrap on Cygwin
On 08/02/2011 16:08, Dave Korn wrote: > Sorry all, been offline for a couple of days after my pc blew up. > > On 07/02/2011 20:50, Angelo Graziosi wrote: > >> I do not understand the logic here: break GCC trunk for something that >> hasn't been yet released. > > But GCC trunk has not been released either yet! GCC trunk and Cygwin trunk > are in synch. The next released versions of both will work together. People > working on trunk occasionally need to use trunk versions of other things such > as binutils or glibc to get the full functionality, and there's always > --disable-libquadmath for those who don't want to. *sigh* Thinko here and in my other reply. Please read 'libdecnumber' and '--disable-decimal-float' respectively. Patches to follow. cheers, DaveK
Re: MELT plugin: test fopen
Hello Pierre! It is an interesting plugin. I'll keep an eye on your repository. Thanks! Daniel Marjamäki 2011/2/9 Pierre Vittet : > Hello, > > I would like to present you a small plugin, which could be a good exemple of > a MELT use case. > This plugin allows to monitor that after every call to the fopen function, > we have a test on the pointer returned by fopen (monitoring that it is not > null). > > It creates a pass after SSA and works on gimple. It firstly matchs the > gimple_call on fopen and save the tree corresponding to the FILE *. Then we > check that the next instruction is a gimple_cond between the saved tree and > the NULL ptr. > When no test are found, the following error is returned: > "test_fopen_cfile.c:35:11: warning: fopen not followed by a test on his > returned pointer [enabled by default]" > > I think MELT is particulary adaptated when we have to match tree or gimples > like I do here. > > For the moment I have only used it on small test file. I will try to see > what it gives on a more realistic small to medium application. > > The code can be find on github: https://github.com/Piervit/GMWarn . The idea > is to add more plugins. If you have some ideas or remarks, I am interested > (however I still have to learn a lot from both GCC and MELT). > > If you try the code you will need to use the GCC MELT branch (or there is > some changes to do in the Makefile to have it with MELT as plugin). > > > Regards! > >
loop hoisting fails
Hi, I am facing a problem with code hoisting from a loop in our backend. This problem seems to be hinted at on: -fbranch-target-load-optimize Perform branch target register load optimization before prologue / epilogue threading. The use of target registers can typically be exposed only during reload, thus hoisting loads out of loops and doing inter-block scheduling needs a separate optimization pass. However, the problem has nothing to do with this flag. Our architecture doesn't allow something like: store 0,(X+2) ; store 0 into mem of X+2 So we need to do load Y,0 ; load Y with constant zero store Y,(X+2) ; store the 0 in Y into mem of X+2 If I compile a loop: int vec[8]; int i = 0; while(i++ < 8) vec[i] = 0; I something like: load X,#vec ?L2: load Y,0 store Y,(X+0) add X,1 The load Y,0 though should be out of the loop. I know what the problem is... gcc during expand has: For ?L2 bb: (set (mem/s:QI (reg:QI 41)) (const_int 0)) This is only transformed after the reload into: (set (reg:QI Y) (const_int 0)) (set (mem/s:QI X) (reg:QI Y)) And so there's no opportunity after reloading for hoistering it seems... so I though about splitting the insn into two during expand: (set (reg:QI Y) (const_int 0)) (set (mem/s:QI X) (reg:QI Y)) But then this is combined by cse into: (set (mem/s:QI (reg:QI 41)) (const_int 0)) and bammm, same problem. No loop hoisting. What's the best way to handle this? Any suggestions? Cheers, Paulo Matos
Spelling mistake!
At http://gcc.gnu.org/onlinedocs/libstdc++/manual/streambufs.html Under section 'Buffering', Particularly is written as 'Chaptericularly'. Regards
Re: GCC 4.6 performance regressions
Hi Everyone, I've been following the thread already from the start (I'm on the list). Matthew Tippett (CC'ed now as well) and I have already been working on some ways to help further and one of us should have some more information to be presented shortly. If any of you have any other questions please feel free to let us know. -- Michael On 02/09/2011 02:34 AM, Sebastian Pop wrote: On Wed, Feb 9, 2011 at 00:20, Tony Poppleton wrote: While I appreciate Phoronix as a booster site, their benchmarking practice often seems very dodgy; I'd take the results with a large grain of salt The main reason I posted the link in the first place was because it was reflecting my own emperical evidence for the application I am working on; the march/mtune flags (on various corei* cpus) are actually detrimental to performance (albeit by at most 10% - but still, not the performance boost I was hoping for). I had been trying for the last few weeks to strip down my application into a mini-benchmark I could create a PR out of, however it is tougher than expected and was hoping the Phoronix article was going to offer a quicker route to finding performance regressions than my code - as their coverage was a lot wider. Anyway, apparently this is not the case, so back to my original work... Would it not be in the best interests for both GCC and Phoronix to rectify the problems in their benchmarks? I agree with you here: for many reasons it would be interesting to have Phoronix as a free alternative to the SPEC benchmarks. Could we not contact the authors of the article and point out how they can make their benchmarks better? I would be happy to contact them myself, however I think it would be far more effective (and valid) coming from a GCC maintainer. I put Michael in CC. Points they have apparently missed so far are; - clarify which compiler flags they are using Also, the mechanism used to change the flags for a benchmark should be documented and should produce the intended behaviour. For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so building this benchmark with CFLAGS="-O2" would have no effect. Sebastian - don't run make -j when looking at compile times - ensure they are using --enable-checking=release when benchmarking a snapshot - in general, as much information as possible as to their setup and usage, to make the results easily repeatable Out of interest, has their been much communication in the past between GCC and Phoronix to address any of these issues in their previous benchmarks? Tony
Re: Spelling mistake!
On 02/09/2011 01:44 PM, Anonymous bin Ich wrote: > At > > http://gcc.gnu.org/onlinedocs/libstdc++/manual/streambufs.html > > Under section 'Buffering', Particularly is written as 'Chaptericularly'. Fixed in SVN. Thanks. Paolo.
Re: RTL Expand pass - Difference in float and int datatypes
Anitha Boyapati writes: > int main() { > volatile int a, b; > if(a>b) return 1; > else return 2; > > } > > Expand pass shows that "le"operator is chosen. > > (jump_insn 9 8 10 3 gt_int.c:4 (set (pc) > (if_then_else (le:CC (cc0) > (const_int 0 [0x0])) > (label_ref 14) > (pc))) -1 (nil)) > > > If I modify the same testcase to use 'float' datatype instead of > 'int', "gt" operator is picked up. > > (jump_insn 9 8 34 3 gt.c:4 (set (pc) > (if_then_else (gt:CC (cc0) > (const_int 0 [0x0])) > (label_ref 12) > (pc))) -1 (nil)) > > > I would like to know what prompts gcc to decide if "le" can be used in > the expand pass rather than "gt" operator. Or more precisely why it > differs incase of float. The choice of LE when using int is just a happenstance of the way that gcc generates code. When gcc comes to RTL generation it happens to be looking at an if statement that falls through on true and branches on else. So it flips the condition to branch on true (in do_compare_rtx_and_jump). The use of volatile doesn't affect this: it will still only load the variables precisely as described in the program. The condition can not be flipped for float because that would be an invalid transformation if one of the values is NaN. Ian
Re: loop hoisting fails
"Paulo J. Matos" writes: > But then this is combined by cse into: > > (set (mem/s:QI (reg:QI 41)) (const_int 0)) > > and bammm, same problem. No loop hoisting. What's the best way to > handle this? Any suggestions? You need to set TARGET_RTX_COSTS to indicate that this operation is relatively expensive. That should stop combine from generating it. Ian
Re: loop hoisting fails
On 09/02/11 15:07, Ian Lance Taylor wrote: You need to set TARGET_RTX_COSTS to indicate that this operation is relatively expensive. That should stop combine from generating it. Thanks, I will give it a try. One of the things that are not as polished as it should is exactly the rtx costs. I am writing costs for optimize_size. It is slightly confusing for me the way it works. I have added a debug printf in it so I can debug the costs and when they are called and I get a list like this: == RTXCOST[pass]: outer_code rtx => cost rec/done== And it starts with: == RTXCOST[]: UnKnown (const_int 0 [0x0]) => 0 [done]== == RTXCOST[]: set (plus:QI (reg:QI 18 [0]) (reg:QI 18 [0])) => 4 [rec]== == RTXCOST[]: set (neg:QI (reg:QI 18 [0])) => 4 [rec]== ... == RTXCOST[]: set (set (reg:QI 21) (subreg:QI (reg:SI 14 virtual-stack-vars) 3)) => 4 [rec]== The first one is the only one that is called with an unknown outer_code. All the other seems to have an outer_code set which should be the context in which it shows up but then you get things like the last line. outer_code is a set but the rtx itself is _also_ a set. How is this possible? Now, what's the best way to write the costs? For example, what's the cost for a constant? If the rtx is a (const_int 0), and the outer is unknown I would guess the cost is 0 since there's no cost in using it. However, if the outer_code is set, then the cost depends on where I am putting the zero into. If the outer_code is plus, should the cost be 0? (since x + 0 = x and the operation will probably be discarded?) Any tips on this? Cheers, Paulo Matos
Re: loop hoisting fails
"Paulo J. Matos" writes: > It is slightly confusing for me the way it works. I have added a debug > printf in it so I can debug the costs and when they are called and I > get a list like this: > > == RTXCOST[pass]: outer_code rtx => cost rec/done== > > And it starts with: > == RTXCOST[]: UnKnown (const_int 0 [0x0]) => 0 [done]== > == RTXCOST[]: set (plus:QI (reg:QI 18 [0]) > (reg:QI 18 [0])) => 4 [rec]== > == RTXCOST[]: set (neg:QI (reg:QI 18 [0])) => 4 [rec]== > ... > == RTXCOST[]: set (set (reg:QI 21) > (subreg:QI (reg:SI 14 virtual-stack-vars) 3)) => 4 [rec]== > > The first one is the only one that is called with an unknown > outer_code. All the other seems to have an outer_code set which should > be the context in which it shows up but then you get things like the > last line. outer_code is a set but the rtx itself is _also_ a set. How > is this possible? SET is more or less a default outer_code, you can just ignore it. > Now, what's the best way to write the costs? For example, what's the > cost for a constant? If the rtx is a (const_int 0), and the outer is > unknown I would guess the cost is 0 since there's no cost in using > it. However, if the outer_code is set, then the cost depends on where > I am putting the zero into. If the outer_code is plus, should the cost > be 0? (since x + 0 = x and the operation will probably be discarded?) For your processor it sounds like you should make a constant more expensive than a register for an outer code of SET. You're right that the cost should really depend on the destination of the set but unfortunately I don't know if you will see that. I agree that costs are unfortunately not very well documented and the middle-end does not use them in the most effective manner. It's still normally the right mechanism to use to control what combine does. Ian
Fwd: AIX vs long double
Cf. http://gcc.gnu.org/ml/gcc/2011-02/msg00109.html and http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47032 Seems as if one needs to add link-time tests to libgfortran for some of the C99 functions - the compile checks exists and succeed. Tobias Original Message Subject: AIX vs long double Date: Mon, 7 Feb 2011 14:01:44 -0500 From: David Edelsohn To: GCC Development AIX provides two versions of long double and declares all of the C99 long double symbols in math.h header file. One implementation aliases long double to IEEE double precision type and the other implementation aliases long double to IBM's paired double format. All of the C99 symbols for IEEE double precision are implemented in libm, but not all of the C99 symbols for the IBM long double format are implemented. IBM's proprietary XL compiler (cc, xlc) defaults to IEEE double precision and provides a special invocation to default to IBM long double (cc128, xlc128). GNU/Linux and GCC on GNU/Linux default to IEEE quad precision long double. Because the long double choice is an ABI change, GCC on AIX switched to GNU/Linux compatibility with support for AIX 6.1, before the incomplete implementation was noticed. This mostly worked until libgfortran started using additional C99 functions, causing GCC Bugzilla [target/47032] libgfortran references complex long double functions missing on AIX libstdc++-v3 on AIX builds with AIX long double support, although user programs that try to access the missing functions (copysign, nextafter, etc.) will experience link-time failures. libgfortran on AIX fails to build with AIX long double support because it accesses the missing functions in its implementation. I would like to solicit feedback about how to proceed: Should GCC on AIX revert back to 64 bit long double size, allowing all libraries to build and work, but breaking GCC/G++ ABI on AIX? Or should it continue to utilize 128 bit long double, building libstdc++-v3 that works if the user program does not utilize the missing symbols and fails to build libgfortran? Thanks, David
Re: RTL Expand pass - Difference in float and int datatypes
On 9 February 2011 20:34, Ian Lance Taylor wrote: >> I would like to know what prompts gcc to decide if "le" can be used in >> the expand pass rather than "gt" operator. Or more precisely why it >> differs incase of float. > > The choice of LE when using int is just a happenstance of the way that > gcc generates code. When gcc comes to RTL generation it happens to be > looking at an if statement that falls through on true and branches on > else. So it flips the condition to branch on true (in > do_compare_rtx_and_jump). The use of volatile doesn't affect this: it > will still only load the variables precisely as described in the > program. > > The condition can not be flipped for float because that would be an > invalid transformation if one of the values is NaN. > ok. I get it. I would like to understand some more of reverse-conditional branches in case of float. For the target I am working on (AVR32), there is a floating point unit which follows IEEE 754. The port is relatively stable. We are now incrementally building support for these FPU instructions (like fadd, fmul, fcmp). Now for the same test case with float data type, after expand pass, in one of the passes (outof_cfglayout) the direct-conditional branches are turned to reverse-conditional branches. Direct-conditional branch > (jump_insn 9 8 34 3 gt.c:4 (set (pc) > (if_then_else (gt:CC (cc0) > (const_int 0 [0x0])) > (label_ref 12) > (pc))) -1 (nil)) Reverse-conditional Branch > (jump_insn 9 8 34 3 gt.c:4 (set (pc) > (if_then_else (gt:CC (cc0) > (const_int 0 [0x0])) > (pc))) -1 (nil)) > (label_ref 14) (Sorry that I don't have access to the installation right now, so I just manually modified the reverse-conditional branch. The idea is to illustrate that label_ref and pc are interchanged). The latter pattern is supposed to emit assembly which tests for the reverse-condition. For instance if the former pattern emits assembly instruction like "brgt " then the latter pattern is supposed to emit instruction like "brle " I am a little confused here. I think this kind of condition reversal should not happen for floating point comparisons (IEEE 754). Am I on the right track still? If yes, how to prevent this kind of jump optimization? I see some REVERSIBLE_CC_MODE which is not defined in our case. Also not defined is REVERSE_CONDITION... Moreover I see that this kind of interdependence of patterns in jumps has been removed when the branch "cond-optab" got merged. So this condition-reversal branch may not be seen in higher versions of gcc ? (like 4.5.1) I am not sure if I am overloading this conversation with too many of questions...I am trying to present the problem(s) clearly. Thanks Ian for the hint on do_compare_rtx_and jump ()!! That was exactly what I looked for. Anitha
Re: RTL Expand pass - Difference in float and int datatypes
On Wed, Feb 9, 2011 at 5:55 PM, Anitha Boyapati wrote: > On 9 February 2011 20:34, Ian Lance Taylor wrote: > >>> I would like to know what prompts gcc to decide if "le" can be used in >>> the expand pass rather than "gt" operator. Or more precisely why it >>> differs incase of float. >> >> The choice of LE when using int is just a happenstance of the way that >> gcc generates code. When gcc comes to RTL generation it happens to be >> looking at an if statement that falls through on true and branches on >> else. So it flips the condition to branch on true (in >> do_compare_rtx_and_jump). The use of volatile doesn't affect this: it >> will still only load the variables precisely as described in the >> program. >> >> The condition can not be flipped for float because that would be an >> invalid transformation if one of the values is NaN. >> > > ok. I get it. > > I would like to understand some more of reverse-conditional branches > in case of float. For the target I am working on (AVR32), there is a > floating point unit which follows IEEE 754. The port is relatively > stable. We are now incrementally building support for these FPU > instructions (like fadd, fmul, fcmp). > > Now for the same test case with float data type, after expand pass, in > one of the passes (outof_cfglayout) the direct-conditional branches > are turned to reverse-conditional branches. > > > Direct-conditional branch > >> (jump_insn 9 8 34 3 gt.c:4 (set (pc) >> (if_then_else (gt:CC (cc0) >> (const_int 0 [0x0])) >> (label_ref 12) >> (pc))) -1 (nil)) > > Reverse-conditional Branch > >> (jump_insn 9 8 34 3 gt.c:4 (set (pc) >> (if_then_else (gt:CC (cc0) >> (const_int 0 [0x0])) >> (pc))) -1 (nil)) >> (label_ref 14) > > (Sorry that I don't have access to the installation right now, so I > just manually modified the reverse-conditional branch. The idea is to > illustrate that label_ref and pc are interchanged). > > The latter pattern is supposed to emit assembly which tests for the > reverse-condition. For instance if the former pattern emits assembly > instruction like "brgt " then the latter pattern is supposed to > emit instruction like "brle " That's the misunderstanding. The above should emit a brngt (branch if !gt). Richard.
Re: RTL Expand pass - Difference in float and int datatypes
On 9 February 2011 22:28, Richard Guenther wrote: > On Wed, Feb 9, 2011 at 5:55 PM, Anitha Boyapati > wrote: >> On 9 February 2011 20:34, Ian Lance Taylor wrote: >> I would like to know what prompts gcc to decide if "le" can be used in the expand pass rather than "gt" operator. Or more precisely why it differs incase of float. >>> >>> The choice of LE when using int is just a happenstance of the way that >>> gcc generates code. When gcc comes to RTL generation it happens to be >>> looking at an if statement that falls through on true and branches on >>> else. So it flips the condition to branch on true (in >>> do_compare_rtx_and_jump). The use of volatile doesn't affect this: it >>> will still only load the variables precisely as described in the >>> program. >>> >>> The condition can not be flipped for float because that would be an >>> invalid transformation if one of the values is NaN. >>> >> >> ok. I get it. >> >> I would like to understand some more of reverse-conditional branches >> in case of float. For the target I am working on (AVR32), there is a >> floating point unit which follows IEEE 754. The port is relatively >> stable. We are now incrementally building support for these FPU >> instructions (like fadd, fmul, fcmp). >> >> Now for the same test case with float data type, after expand pass, in >> one of the passes (outof_cfglayout) the direct-conditional branches >> are turned to reverse-conditional branches. >> >> >> Direct-conditional branch >> >>> (jump_insn 9 8 34 3 gt.c:4 (set (pc) >>> (if_then_else (gt:CC (cc0) >>> (const_int 0 [0x0])) >>> (label_ref 12) >>> (pc))) -1 (nil)) >> >> Reverse-conditional Branch >> >>> (jump_insn 9 8 34 3 gt.c:4 (set (pc) >>> (if_then_else (gt:CC (cc0) >>> (const_int 0 [0x0])) >>> (pc))) -1 (nil)) >>> (label_ref 14) >> >> (Sorry that I don't have access to the installation right now, so I >> just manually modified the reverse-conditional branch. The idea is to >> illustrate that label_ref and pc are interchanged). >> >> The latter pattern is supposed to emit assembly which tests for the >> reverse-condition. For instance if the former pattern emits assembly >> instruction like "brgt " then the latter pattern is supposed to >> emit instruction like "brle " > > That's the misunderstanding. The above should emit a brngt > (branch if !gt). > OK. We are using the same branch pattern for INT as well as all modes. For int, generating "brle" should be fine I think where as float this is wrong. So that leaves us to define a separate branch pattern for floating mode. It would be very interesting to know why a direct-conditional branch is turned to reverse-conditional branch at all... I have been skimming through jump.c and cfglayout.c for some clues. Anitha
g++.dg/tree-ssa/inline-3.C
This testcase is ill-formed (and breaks in C++0x mode), because it declares puts to have the wrong return type. I note that changing it to have the right return type causes it to no longer be inlined. What do you suggest we do about this? Jason
Re: g++.dg/tree-ssa/inline-3.C
On Wed, Feb 9, 2011 at 11:39 PM, Jason Merrill wrote: > This testcase is ill-formed (and breaks in C++0x mode), because it declares > puts to have the wrong return type. I note that changing it to have the > right return type causes it to no longer be inlined. What do you suggest we > do about this? I think this is PR47663 - something I stumbled over today by accident ;) I suggest to either XFAIL the testcase or change it to use std::free (or some other function with void return type). Richard. > Jason >
math-68881.h vs -ffast-math
Hello. The file gcc/config/m68k/math-68881.h is distributed with GCC. It is about inlining the libm functions using FPU instructions on m68k targets. But -ffast-math seems to serve the same purpose, even better. My question: Is math-68881.h still useful for some purpose ? BTW: That file does not support C99, I have just posted a patch to fix that. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47672 -- Vincent Rivière
Re: GCC 4.6 performance regressions
On Wed, Feb 9, 2011 at 2:42 AM, Jonathan Wakely wrote: > On 9 February 2011 08:34, Sebastian Pop wrote: >> >> For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so >> building this benchmark with CFLAGS="-O2" would have no effect. > > Why not? > > Ignoring the fact -O3 is the highest level for GCC, the manual says: > "If you use multiple -O options, with or without level numbers, the > last such option is the one that is effective." > http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > > And CFLAGS="-fno-fast-math -O2" would cancel the effects of -ffast-math too. > Because the makefile can override CFLAGS in the environment (or in a make variable) at any time, and GCC wouldn't even see it. $ cat makefile t1: @echo passed in VENV=$$VENV VMAK=$(VMAK) t2: @echo env oride; VENV=v1 make t1 @echo mak oride; make VMAK=v2 t1 $ make passed in VENV= VMAK= $ VENV=env_from_cli make VMAK=make_from_cli passed in VENV=env_from_cli VMAK=make_from_cli $ $ make t2 env oride make[1]: Entering directory `/tmp' passed in VENV=v1 VMAK= make[1]: Leaving directory `/tmp' mak oride make[1]: Entering directory `/tmp' passed in VENV= VMAK=v2 make[1]: Leaving directory `/tmp' $ $ VENV=env_from_cli make VMAK=make_from_cli t2 env oride make[1]: Entering directory `/tmp' passed in VENV=v1 VMAK=make_from_cli make[1]: Leaving directory `/tmp' mak oride make[1]: Entering directory `/tmp' passed in VENV=env_from_cli VMAK=v2 make[1]: Leaving directory `/tmp'
ICE in get_constraint_for_component_ref
Hi all, I am trying to port a private target in GCC 4.5.1. Following are the properties of the target #define BITS_PER_UNIT 32 #define BITS_PER_WORD32 #define UNITS_PER_WORD 1 #define CHAR_TYPE_SIZE32 #define SHORT_TYPE_SIZE 32 #define INT_TYPE_SIZE 32 #define LONG_TYPE_SIZE32 #define LONG_LONG_TYPE_SIZE 32 I am getting an ICE internal compiler error: in get_constraint_for_component_ref, at tree-ssa-structalias.c:3031 For the following testcase: struct fb_cmap { int start; int len; int *green; }; extern struct fb_cmap fb_cmap; void directcolor_update_cmap(void) { fb_cmap.green[0] = 34; } The following is the output of debug_tree of the argument thats given for the function get_constraint_for_component_ref unit size align 32 symtab 0 alias set -1 canonical type 0x2b6a4554a498 precision 32 min max pointer_to_this > unsigned PQI size unit size align 32 symtab 0 alias set -1 canonical type 0x2b6a45559930> arg 0 unit size align 32 symtab 0 alias set -1 canonical type 0x2b6a45602888 fields context chain > used public external common BLK file pr28675.c line 7 col 23 size unit size align 32 chain public static QI file pr28675.c line 9 col 6 align 32 initial result (mem:QI (symbol_ref:PQI ("directcolor_update_cmap") [flags 0x3] ) [0 S1 A32]) struct-function 0x2b6a455453f0>> arg 1 unsigned PQI file pr28675.c line 4 col 7 size unit size align 32 offset_align 32 offset bit offset context > pr28675.c:11:10> I was wondering if this ICE is due to the fact that this is a 32bit char target ? Can somebody help me with pointers to debug this issue? Regards, Shafi