Re: Compile time of Expression Templates in C++
On 10/10/06, Jochen Haerdtlein <[EMAIL PROTECTED]> wrote: Hello, I am a PhD student working on the extended use of expression templates for solving partial differential equations. Thus, I did a lot of studying of expression templates papers, articles and expression templates implementations. Actually, I had some ideas for improving them for high performance platforms. Since compile time of huge expression template programs is a severe problem, I am interested in the mechanisms within g++ that are necessary to resolve the nested template constructs. Unfortunately, it is pretty hard (for me) to find any hints on that. Is there any documentation concerning the policies within gcc how the templates are resolved, instatiated, etc. ? Where is the time spent during compiling? I already started the -ftime-report option and got some information. Concerning that, about one third of the time is needed by the parser. But I would like to understand it in more detail, if possible. First I would be happy for any reaction of you, just telling me whether my interests are complete nonsense or unsolvable. Maybe, you can give me some hints, where I can find some answers, or somebody I can annoy. You may want to look at PR29433. Richard.
Re: Getting type used in THROW_EXPR
On 10/14/06, Brendon Costa <[EMAIL PROTECTED]> wrote: Hi all, I have yet another question that has arisen as i have started testing my code. Basically I am trying to get the type that is being used in throwing an exception. Is there a simple macro i can use to get the type of an exception from a THROW_EXPR? I think this is a matter of getting the TREE_TYPE for the value passed into the function: except.c: build_throw(tree exp) If you look at cp/cp-tree.def you will see /* A throw expression. operand 0 is the expression, if there was one, else it is NULL_TREE. */ DEFTREECODE (THROW_EXPR, "throw_expr", tcc_expression, 1) which means that TREE_OPERAND (t, 0) is the expression thrown. Based on whether that is a reference already or not, you need to create a reference by your own using build1 (ADDR_EXPR, ...) with a properly constructed reference type (I guess there's some helper for that in the C++ frontend). Richard.
Re: is this a good time to commit a patch on builtin_function?
On 10/23/06, Rafael Espíndola <[EMAIL PROTECTED]> wrote: I have an approved patch that factors code that is common to all builtin_function implementations (http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00195.html, http://gcc.gnu.org/ml/gcc-patches/2006-06/msg01499.html). I have just updated and tested it. Is this a good time to commit? Usually such changes disturb branches and are more suitable for late stage2. But in this case I don't see why this should be the case. Richard.
Re: memory benchmark of tuples branch
On 10/27/06, Aldy Hernandez <[EMAIL PROTECTED]> wrote: > My vote is to merge into mainline sooner rather than later. However, it > is a big patch and affects just about every module in the compiler, so I > wouldn't want to barge in without getting some consensus first. I agree with you and Mark. What I'd like to do next is: 1. Merge mainline into the branch to make sure there are no snafus. 2. Post the patch so folks can look at it while I do 3 & 4: 3. Attack the other front ends. So far I have only dealt with C/C++, but the FE only changes are usually minimal. 4. Adjust the 10 or so back ends that make MODIFY_EXPR nodes. 5. Merge into mainline. And then... start working on the other tuples. How does this sound to y'all? Does the tuples branch include the CALL_EXPR reworking from the LTO branch? Richard.
Re: compiling very large functions.
On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: I think that it is time that we in the GCC community took some time to address the problem of compiling very large functions in a somewhat systematic manner. GCC has two competing interests here: it needs to be able to provide state of the art optimization for modest sized functions and it needs to be able to properly process very large machine generated functions using reasonable resources. I believe that the default behavior for the compiler should be that certain non essential passes be skipped if a very large function is encountered. There are two problems here: 1) defining the set of optimizations that need to be skipped. 2) defining the set of functions that trigger the special processing. For (1) I would propose that three measures be made of each function. These measures should be made before inlining occurs. The three measures are the number of variables, the number of statements, and the number of basic blocks. Why before inlining? These three numbers can change quite significantly as a function passes through the pass pipeline. So we should try to keep them up-to-date to have an accurate measurement. Otherwise the proposal sounds reasonable but we should make sure the limits we impose allow reproducible compilations for N x M cross configurations and native compilation on different sized machines. Richard.
Re: compiling very large functions.
On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: Richard Guenther wrote: > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: >> I think that it is time that we in the GCC community took some time to >> address the problem of compiling very large functions in a somewhat >> systematic manner. >> >> GCC has two competing interests here: it needs to be able to provide >> state of the art optimization for modest sized functions and it needs to >> be able to properly process very large machine generated functions using >> reasonable resources. >> >> I believe that the default behavior for the compiler should be that >> certain non essential passes be skipped if a very large function is >> encountered. >> >> There are two problems here: >> >> 1) defining the set of optimizations that need to be skipped. >> 2) defining the set of functions that trigger the special processing. >> >> >> For (1) I would propose that three measures be made of each function. >> These measures should be made before inlining occurs. The three measures >> are the number of variables, the number of statements, and the number of >> basic blocks. > > Why before inlining? These three numbers can change quite significantly > as a function passes through the pass pipeline. So we should try to keep > them up-to-date to have an accurate measurement. > I am flexible here. We may want inlining to be able to update the numbers. However, I think that we should drive the inlining agression based on these numbers. Well, for example jump threading and tail duplication can cause these numbers to significantly change. Also CFG instrumentation and PRE can increase the BB count. So we need to deal with cases where an optimization produces overly large number of basic blocks or instructions. (like by throtting those passes properly) Richard.
Re: compiling very large functions.
On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: Richard Guenther wrote: > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: >> Richard Guenther wrote: >> > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: >> >> I think that it is time that we in the GCC community took some >> time to >> >> address the problem of compiling very large functions in a somewhat >> >> systematic manner. >> >> >> >> GCC has two competing interests here: it needs to be able to provide >> >> state of the art optimization for modest sized functions and it >> needs to >> >> be able to properly process very large machine generated functions >> using >> >> reasonable resources. >> >> >> >> I believe that the default behavior for the compiler should be that >> >> certain non essential passes be skipped if a very large function is >> >> encountered. >> >> >> >> There are two problems here: >> >> >> >> 1) defining the set of optimizations that need to be skipped. >> >> 2) defining the set of functions that trigger the special processing. >> >> >> >> >> >> For (1) I would propose that three measures be made of each function. >> >> These measures should be made before inlining occurs. The three >> measures >> >> are the number of variables, the number of statements, and the >> number of >> >> basic blocks. >> > >> > Why before inlining? These three numbers can change quite >> significantly >> > as a function passes through the pass pipeline. So we should try >> to keep >> > them up-to-date to have an accurate measurement. >> > >> I am flexible here. We may want inlining to be able to update the >> numbers. However, I think that we should drive the inlining agression >> based on these numbers. > > Well, for example jump threading and tail duplication can cause these > numbers to significantly change. Also CFG instrumentation and PRE > can increase the BB count. So we need to deal with cases where an > optimization produces overly large number of basic blocks or > instructions. > (like by throtting those passes properly) > I lean to leave the numbers static even if they do increase as time goes by. Otherwise you get two effects, the first optimizations get to be run more, and you get the wierd non linear step functions where small changes in some upstream function effect the down stream. Ok, I guess we can easily flag each function as having - many BBs - big BBs - complex CFG (many edges) and set these flags at CFG construction time during the lowering phase (which is after the early inlining pass I believe). The number of basic blocks is kept up-to-date during optimization, the other numbers would need to be re-generated if we want to keep them up-to-date. But with just using three (or even only one?) flag, we can easily fit this information in struct function. I also like the idea of a "hot" function flag to be able to dynamically switch betweed optimize_size and !optimize_size in the tree optimizers. Profile based inlining already tries to follow that route. Richard.
Re: strict aliasing question
On 11/10/06, Howard Chu <[EMAIL PROTECTED]> wrote: I see a lot of APIs (e.g. Cyrus SASL) that have accessor functions returning values through a void ** argument. As far as I can tell, this doesn't actually cause any problems, but gcc 4.1 with -Wstrict-aliasing will complain. For example, take these two separate source files: alias1.c #include extern void getit( void **arg ); main() { int *foo; getit( (void **)&foo); printf("foo: %x\n", *foo); } alias2.c static short x[] = {16,16}; void getit( void **arg ) { *arg = x; } gcc -O3 -fstrict-aliasing -Wstrict-aliasing *.c -o alias The program prints the expected result with both strict-aliasing and no-strict-aliasing on my x86_64 box. As such, when/why would I need to worry about this warning? If you compile with -O3 -combine *.c -o alias it will break. Richard.
Re: Polyhedron performance regression
On 11/11/06, FX Coudert <[EMAIL PROTECTED]> wrote: > Just wanted to note to the list that Tobias spotted a performance > regression on Polyhedron ac. > > http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron- > summary.txt-2-0.html Hum, the performance change on ac is significant. Anyway we can get the revision numbers before and after the jump? (and before the last jump to zero)? What patches have been commited in that time that could affect this? I can't see anything on the fortran patches... If I had to guess I would say it was the forwprop merge. But I didn't investigate. Richard.
Re: Polyhedron performance regression
On Sat, 11 Nov 2006, FX Coudert wrote: > >Just wanted to note to the list that Tobias spotted a performance regression > >on Polyhedron ac. > > > >http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-summary.txt-2-0.html > > Hum, the performance change on ac is significant. Anyway we can get the > revision numbers before and after the jump? (and before the last jump to > zero)? What patches have been commited in that time that could affect this? I > can't see anything on the fortran patches... Must have between r118372 and r118615. Richard.
Re: -funsafe-math-optimizations and -fno-rounding-math
On 11/11/06, Revital1 Eres <[EMAIL PROTECTED]> wrote: Hello, -fno-rounding-math enables the transformation of (-(X - Y)) -> (Y - X) in simplify-rtx.c which seems to be the same transformation that enabled by -funsafe-math-optimizations in fold-const.c. If I understand currently -frounding-math means that the rounding mode is important. In that case should there be correlation between -funsafe-math-optimizations and -fno-rounding-math (which currently does not exist)? I think the simplify-rtx.c code is partly wrong, as it changes behavior with signed zeros. I don't know off-hand if -(X - Y) and Y - X behave the same in rounding if the rounding mode is round to nearest, but certainly for round to +Inf it will differ. So HONOR_SIGNED_ZEROS (mode) && !flag_rounding_math might be the correct predicate here (and in the fold-const.c case). But floating point rounding scares me ;) Richard.
Re: Polyhedron performance regression
On Sat, 11 Nov 2006, Paul Thomas wrote: > Steven and Jerry, > > > > > > If someone can confirm that this patch causes the drop, I can help > > trying to find a fix. > > > amd64/Cygwin_NT > > $ /irun/bin/gfortran -O3 -funroll-loops -ffast-math -march=opteron ac.f90 > > 118372 20.2s > 118475 20.4s Bonzini's patch > 118704 16.2s > > I believe that the improvement is FX's and my patch for MOD. Note that the suse x86_64 tester has 18.6s with the last run before bonzinis patch and 30.8s the first run after it, so it regressed quite badly. It also has -ftree-vectorize as additional option here. > > Notice that this is a single core machine and that there is a PR out on the > vectorizer. Could this be the problem, since the suse tests are done on a two > core machine, if I understood correctly? Yes this is a dual-socket machine. But I don't see how this can be an issue here? Richard. -- Richard Guenther <[EMAIL PROTECTED]> Novell / SUSE Labs
Re: Volatile operations and PRE
On 08 Nov 2006 08:07:50 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: Andrew Haley <[EMAIL PROTECTED]> writes: > > 2006-11-07 Paolo Bonzini <[EMAIL PROTECTED]> > > > > * gimplify.c (fold_indirect_ref_rhs): Use > > STRIP_USELESS_TYPE_CONVERSION rather than STRIP_NOPS. > > Regtested x86-64-gnu-linux. The only interesting failure was > mayalias-2.c, but that also fails before the patch. This is OK for active branches. I have committed this to mainline and will continue with the branches after testing. Richard.
Re: Zdenek Dvorak and Daniel Berlin appointed loop optimizer maintainers
On 11/16/06, Andrew Haley <[EMAIL PROTECTED]> wrote: Zdenek Dvorak writes: > Hello, > > >I am pleased to announce that the GCC Steering Committee has > > appointed Zdenek Dvorak and Daniel Berlin as non-algorithmic maintainers > > of the RTL and Tree loop optimizer infrastructure in GCC. > > > >Please join me in congratulating Zdenek and Daniel on their new > > role. Zdenek and Daniel, please update your listings in the MAINTAINERS > > file. > > done. I am not sure whether it would be useful to indicate > non-algorithmicness somehow in the MAINTAINERS file? "non-algorithmicity", I suspect. :-) I would rather open a new section if this idiom is supposed to spread more ;-) Richard.
Re: Volatile operations and PRE
On 11/15/06, Richard Guenther <[EMAIL PROTECTED]> wrote: On 08 Nov 2006 08:07:50 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > Andrew Haley <[EMAIL PROTECTED]> writes: > > > > 2006-11-07 Paolo Bonzini <[EMAIL PROTECTED]> > > > > > > * gimplify.c (fold_indirect_ref_rhs): Use > > > STRIP_USELESS_TYPE_CONVERSION rather than STRIP_NOPS. > > > > Regtested x86-64-gnu-linux. The only interesting failure was > > mayalias-2.c, but that also fails before the patch. > > This is OK for active branches. I have committed this to mainline and will continue with the branches after testing. Done. Richard.
Re: SPEC CFP2000 and polyhedron runtime scores dropped from 13. november onwards
On 12/1/06, Uros Bizjak <[EMAIL PROTECTED]> wrote: Hello! At least on x86_64 and i686 SPEC score [1] and polyhedron [2] scores dropped noticeably. For SPEC benchmarks, mgrid, galgel, ammp and sixtrack tests are affected and for polygedron, ac (second regression in the peak) and protein (?) regressed in that time frame. [1] http://www.suse.de/~aj/SPEC/amd64/CFP/summary-britten/recent.html [2] http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-summary.txt-2-0.html Does anybody have any idea what is going on there? It correlates with the PPRE introduction (enabled at -O3 only) which might increase register pressure, but also improves Polyhedron rnflow a lot. Richard.
Re: [PATCH]: Require MPFR 2.2.1
On 12/3/06, Kaveh R. GHAZI <[EMAIL PROTECTED]> wrote: This patch updates configure to require MPFR 2.2.1 as promised here: http://gcc.gnu.org/ml/gcc/2006-12/msg00054.html Tested on sparc-sun-solaris2.10 using mpfr-2.2.1, mpfr-2.2.0 and an older mpfr included with gmp-4.1.4. Only 2.2.1 passed (as expected). I'd like to give everyone enough time to update their personal installations and regression testers before installing this. Does one week sound okay? If there are no objections, that's what I'd like to do. Please don't. It'll be a hassle for us again and will cause automatic testers to again miss some days or weeks during stage1 (given christmas holiday season is near). Rather defer to the start of stage3 please. Thanks, Richard.
Re: Gfortran and using C99 cbrt for X ** (1./3.)
On 12/3/06, Toon Moene <[EMAIL PROTECTED]> wrote: Richard, Somewhere, in a mail lost in my large e-mail clash with my ISP (verizon), you said that gfortran couldn't profit from the pow(x, 1./3.) -> cbrt(x) conversion because gfortran didn't "know" of cbrt. Could you be more explicit about this - I'd like to repair this deficiency, if at all possible. Thanks in advance ! It's a matter of making the cbrt builtin available - I have a patch for this, but wondered if the fortran frontend can rely on the cbrt library call being available? Or available in a fast variant, not a fallback implementation in libgfortran which does pow (x, 1./3.) which will then of course pessimize pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp expansion. Richard.
Re: Gfortran and using C99 cbrt for X ** (1./3.)
On 12/4/06, Howard Hinnant <[EMAIL PROTECTED]> wrote: On Dec 4, 2006, at 11:27 AM, Richard Guenther wrote: > On 12/3/06, Toon Moene <[EMAIL PROTECTED]> wrote: >> Richard, >> >> Somewhere, in a mail lost in my large e-mail clash with my ISP >> (verizon), you said that gfortran couldn't profit from the pow(x, >> 1./3.) >> -> cbrt(x) conversion because gfortran didn't "know" of cbrt. >> >> Could you be more explicit about this - I'd like to repair this >> deficiency, if at all possible. >> >> Thanks in advance ! > > It's a matter of making the cbrt builtin available - I have a patch > for this, > but wondered if the fortran frontend can rely on the cbrt library > call being > available? Or available in a fast variant, not a fallback > implementation in > libgfortran which does pow (x, 1./3.) which will then of course > pessimize > pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp expansion. Is pow(x, 1./3.) == cbrt(x) ? My handheld calculator says (imagining a 3 decimal digit machine): pow(64.0, .333) == 3.99 In other words, can pow assume that if it sees .333, that the client actually meant the non-representable 1/3? Or must pow assume that . 333 means .333? It certainly will _not_ recognize 0.333 as 1/3 (or 0.33 as used in Polyhedron aermod). Instead it will require the exponent to be exactly equal to the result of 1./3. in the precision of the exponent, with correct rounding. My inclination is that if pow(x, 1./3.) (computed correctly to the last bit) ever differs from cbrt(x) (computed correctly to the last bit) then this substitution should not be done. C99 7.12.7.1 says "The cbrt functions compute the real cube root of x." and "The cbrt functions return x**1/3". So it looks to me that cbrt is required to return the same as pow(x, 1/3). Richard.
Re: Richard Guenther appointed middle-end maintainer
On 12/4/06, David Edelsohn <[EMAIL PROTECTED]> wrote: I am pleased to announce that the GCC Steering Committee has appointed Richard Guenther as non-algorithmic middle-end maintainer. Please join me in congratulating Richi on his new role. Richi, please update your listings in the MAINTAINERS file. Thanks! Any objections to add a new section in the MAINTAINERS file like below? Richard. 2006-12-04 Richard Guenther <[EMAIL PROTECTED]> * MAINTAINERS (Non-Algorithmic Maintainers): New section. (Non-Algorithmic Maintainers): Move over non-algorithmic loop optimizer maintainers, add myself as a non-algorithmic middle-end maintainer. p Description: Binary data
Re: Richard Guenther appointed middle-end maintainer
On 12/4/06, Richard Guenther <[EMAIL PROTECTED]> wrote: On 12/4/06, David Edelsohn <[EMAIL PROTECTED]> wrote: > I am pleased to announce that the GCC Steering Committee has > appointed Richard Guenther as non-algorithmic middle-end maintainer. > > Please join me in congratulating Richi on his new role. Richi, > please update your listings in the MAINTAINERS file. Thanks! Any objections to add a new section in the MAINTAINERS file like below? Committed after "ok" on IRC. Richard. 2006-12-04 Richard Guenther <[EMAIL PROTECTED]> * MAINTAINERS (Non-Algorithmic Maintainers): New section. (Non-Algorithmic Maintainers): Move over non-algorithmic loop optimizer maintainers, add myself as a non-algorithmic middle-end maintainer.
Re: Gfortran and using C99 cbrt for X ** (1./3.)
On 12/4/06, Howard Hinnant <[EMAIL PROTECTED]> wrote: On Dec 4, 2006, at 4:57 PM, Richard Guenther wrote: > >> My inclination is that if pow(x, 1./3.) (computed >> correctly to the last bit) ever differs from cbrt(x) (computed >> correctly to the last bit) then this substitution should not be done. > > C99 7.12.7.1 says "The cbrt functions compute the real cube root > of x." and "The cbrt functions return x**1/3". So it looks to me > that cbrt is required to return the same as pow(x, 1/3). For me, this: #include #include int main() { printf("pow(100., 1./3.) = %a\ncbrt(100.) = %a\n", pow(100., 1./3.), cbrt(100.)); } prints out: pow(100., 1./3.) = 0x1.8fffep+6 cbrt(100.) = 0x1.9p+6 I suspect that both are correct, rounded to the nearest least significant bit. Admittedly I haven't checked the computation by hand for pow(100., 1./3.). But I did the computation using gcc 4.0 on both PPC and Intel Mac hardware, and on PPC using CodeWarrior math libs, and got the same results on all three platforms. The pow function is not raising 1,000,000 to the power of 1/3. It is raising 1,000,000 to some power which is very close to, but not equal to 1/3. Perhaps I've misunderstood and you have some way of exactly representing the fraction 1/3 in Gfortran. In C and C++ we have no way to exactly represent that fraction except implicitly using cbrt. (or with a user-defined rational type) 1./3. is represented (round-to-nearest) as 0x1.5p-2 and pow (x, 1./3.) is of course not the same as the cube-root of x with exact arithmetic. cbrt as defined by C99 suggests that an approximation by pow (x, 1./3.) fullfils the requirements. The question is whether a correctly rounded "exact" cbrt differs from the pow replacement by more than 1ulp - it looks like this is not the case. For cbrt (100.) I get the same result as from pow (100., nextafter (1./3., 1)) for example. So, instead of only recognizing 1./3. rounded to nearest we should recognize both representable numbers that are nearest to 1./3.. Richard.
Re: Gfortran and using C99 cbrt for X ** (1./3.)
On 12/5/06, Joseph S. Myers <[EMAIL PROTECTED]> wrote: On Tue, 5 Dec 2006, Richard Guenther wrote: > is whether a correctly rounded "exact" cbrt differs from the pow > replacement by more than 1ulp - it looks like this is not the case. They fairly obviously differ for negative arguments, which are valid for cbrt but not for pow (raising to a fraction with even denominator). (The optimization from pow to cbrt is valid if you don't care about no longer getting a NaN from a negative argument. Converting the other way (cbrt to pow) is only OK if you don't care about negative arguments to cbrt at all.) True, F.9.4.4 says "pow (x, y) returns a NaN and raises the invalid floating-point exception for finite x < 0 and finite non-integer y. I'll adjust the expander to cover these cases by conditionalizing on tree_nonnegative_p or HONOR_NANS. I will probably also require flag_unsafe_math_optimizations as we else will optimize cbrt (x) - pow (x, 1./3.) to zero. Or even pow (x, 1./3.) - pow (x, nextafter (1./3, 1)). Richard.
Re: Announce: MPFR 2.2.1 is released
On 05 Dec 2006 07:16:04 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: Paul Brook <[EMAIL PROTECTED]> writes: > > This all may just be a shakedown problem with MPFR, and maybe it will > > stabilize shortly. But it's disturbing that after one undistributed > > version became a requirement, we then very shortly stepped up to a new > > undistributed version. I think it should be obvious that if we > > require an external library which is not in the tree, we should not be > > at the bleeding edge of that library. > > I thought we were going from an unreleased version to the subsequent release. As far as I know both versions are released. What I said was "undistributed," by which I mean: the required version of MPFR is not on my relatively up to date Fedora system. It also missed the openSUSE 10.2 schedule (which has the old version with all patches). So I don't like rejecting the old version at any point. Richard.
Re: Gfortran and using C99 cbrt for X ** (1./3.)
On 12/5/06, Toon Moene <[EMAIL PROTECTED]> wrote: Richard Guenther wrote: > It's a matter of making the cbrt builtin available - I have a patch for > this, but wondered if the fortran frontend can rely on the cbrt library call > being available? Or available in a fast variant, not a fallback > implementation in libgfortran which does pow (x, 1./3.) which will then of course pessimize > pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp expansion. Couldn't libgfortran just simply borrow, errr, include the glibc version ? That one seems simple and fast enough. Does libgfortran have a cbrt now? The frontend could make the builtin available conditional on TARGET_C99_FUNCTIONS, this way we won't need a fallback. OTOH, I somehow assumed this expansion was protected by -funsafe-math-optimizations, but further testing showed me that it is not. Yes, that was a mistake. I posted a patch for this: http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00320.html This wouldn't scare me one bit (I'm used to the phrase "a processor dependent approximation to the value of", which is used *a lot* in the Fortran Standard), but some people might think this to be too valiant. Yes, I guess so ;) Richard.
Re: Gfortran and using C99 cbrt for X ** (1./3.)
On 12/5/06, Toon Moene <[EMAIL PROTECTED]> wrote: Toon Moene wrote: > Toon Moene wrote: > >> Richard Guenther wrote: >> >>> It's a matter of making the cbrt builtin available - I have a patch >>> for this, > > Oh, BTW, my own version of this patch (plus your work in the area of > sqrt) had the following effect on a profile breakdown The speed up is around 5 %. Is this because of cbrt or a combined effect? Can you measure the cbrt effect in isolation? Thanks, Richard.
Re: compile time testsuite [Re: alias slowdown?]
On 12/3/06, Gerald Pfeifer <[EMAIL PROTECTED]> wrote: Hi Richie, On Tue, 21 Nov 2006, Richard Guenther wrote: > Public monitoring would be more useful. If you have working single-file > testcases that you want be monitored for compile-time and memory-usage > just contact me and I can add them to the daily tester > (http://www.suse.de/~gcctest/c++bench/). that's a neat one. Would you mind adding this to the list of performance testers at http://gcc.gnu.org/benchmarks/ ? That would be nice. Done with the attached patch. Richard. p Description: Binary data
Re: Serious SPEC CPU 2006 FP performance regressions on IA32
On 12/13/06, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote: Meissner, Michael wrote: >>> 437.leslie3d-26% > it was felt that the PPRE patches that were added on November 13th were > the cause of the slowdown: > http://gcc.gnu.org/ml/gcc/2006-12/msg00023.html > > Has anybody tried doing a run with just ppre disabled? > Right. PPRE appears to be the reason of slowdown. -fno-tree-pre gets performance of cpu2006/437.leslie3d back to normal. This is the worst case. And that will take much longer to verify whole set of cpu2006 benchmarks. It would be sooo nice to have a (small) testcase that shows why we are regressing. Thanks! Richard.
Re: Serious SPEC CPU 2006 FP performance regressions on IA32
On 12/13/06, Menezes, Evandro <[EMAIL PROTECTED]> wrote: > Meissner, Michael wrote: > >>> 437.leslie3d -26% > > it was felt that the PPRE patches that were added on > November 13th were > > the cause of the slowdown: > > http://gcc.gnu.org/ml/gcc/2006-12/msg00023.html > > > > Has anybody tried doing a run with just ppre disabled? > > Right. PPRE appears to be the reason of slowdown. > > -fno-tree-pre gets performance of cpu2006/437.leslie3d back to normal. > This is the worst case. And that will take much longer to > verify whole > set of cpu2006 benchmarks. If -fno-tree-pre disables PPRE, then it doesn't change much (4.3 relative to 4.2 with -O2): CPU2006 -O2 -O2 -fno-tree-pre 410.bwaves -6% -8% 416.gamess 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d-26%-27% 444.namd 447.dealII 450.soplex 453.povray 454.calculix 459.GemsFDTD-12%-12% 465.tonto 470.lbm 481.wrf 482.sphinx3 Is PPRE enabled at -O2 at all? I couldn't confirm that from the original patches, which enabled PPRE only at -O3. PPRE is only enabled at -O3. Richard.
Re: libjvm.la and problems with libtool relink mode
On 12/14/06, Mark Shinwell <[EMAIL PROTECTED]> wrote: I am currently involved in building GCC toolchains by configuring them with the prefix that is to be used on the target system (somewhere in /opt) and then installing them in a separate "working installation" directory (say somewhere in my scratch space). The installation step into this "working installation" directory is performed by invoking a command of the form "make prefix=/path/to/working/dir/install install". I think you should use DESTDIR instead of prefix here. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/29/06, Robert Dewar <[EMAIL PROTECTED]> wrote: Daniel Berlin wrote: > I'm sure no matter what argument i come up with, you'll just explain it away. > The reality is the majority of our users seem to care more about > whether they have to write "typename" in front of certain declarations > than they do about signed integer overflow. I have no idea how you know this, to me ten reports seems a lot for something like this. Not compared to the number of type-based aliasing "bugs" reported. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/29/06, Paul Eggert <[EMAIL PROTECTED]> wrote: Ian Lance Taylor <[EMAIL PROTECTED]> writes: > Does anybody think that Paul's proposed patch to autoconf would be > better than changing VRP? I don't. I haven't noticed anyone else addressing this question, which I think is a good one. I don't think doing any of both is a good idea. Authors of the affected programs should adjust their makefiles instead - after all, the much more often reported problems are with -fstrict-aliasing, and this one also doesn't get any special treatment by autoconf. Even though -fno-strict-aliasing -fwrapv would be a valid, more forgiving default. Also as ever, -O2 is what get's the most testing, so you are going to more likely run into compiler bugs with -fwrapv. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/29/06, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote: On Fri, Dec 29, 2006 at 10:44:02PM +0100, Florian Weimer wrote: > (BTW, I would be somewhat disappointed if this had to be pampered over > on the autoconf side. If the GNU project needs -fwrapv for its own > software by default, this should be reflected in the compiler's > defaults. I absolutely agree. My impression is that the current situation is a disagreement between (some of, at least) the GCC developers, and someone who can commit to autoconf; but I think it would be a very bad choice for the GNU project to work around itself. If we can't come to an agreement on the list, please ask the Steering Committee. This is a textbook example of what they're for. But first produce some data please. I only remember one case where the program was at fault and not gcc from the transition from 4.0 to 4.1 compiling all of SUSE Linux 10.1 (barring those we don't recognized of course). Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/30/06, Paul Eggert <[EMAIL PROTECTED]> wrote: For example, GCC itself assumes wrapv semantics internally, but according to the -fwrapv opponents GCC is obviously "at fault" here and should be fixed, so that shouldn't count, right? (If that's the way the data will be collected, I think I know how things will turn out. :-) Where does GCC assume wrapv semantics? GCC assumes two's complement arithmetic in the backend - but that's different as it's not ISO C it is generating but target machine assembly code. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 30 Dec 2006 23:55:46 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: [EMAIL PROTECTED] (Richard Kenner) writes: | > Here's an example from the intprops module of gnulib. | | These are interesting case. | | Note that all the computations are constant-folded. | | And I think this points the fact that we can "have our cake and eat it too" | in many cases. Everything we're seeing points to the fact that the cases | where assuming undefined behavior for overflow will help optimizations are | cases where few if any sane programmers would write the code depending on | wrap semantics (e.g., loop variables). And nearly all the cases where wrap | semantics are expected (e.g., the above) are such that there's no reasonable | optimization benefit in assuming they're undefined. | | Take constant folding: if we were pendantic about it, we could say, "this | folded expression overflows and so is undefined, so let's set its value to be | whatever constant would yield the most efficient code in that case". | | Such behavior would be standard-compliant, but as unfriendly as possible | because it wouldn't "optimize" any real code, just break common idioms. | I doubt anybody would suggest not implementing wrapping semantics in | constant folding. As I'm looking into the VRP and CHREC codes to implement the -Wundefined warning, I came across this: /* Wrapper around int_const_binop. If the operation overflows and we are not using wrapping arithmetic, then adjust the result to be -INF or +INF depending on CODE, VAL1 and VAL2. */ static inline tree vrp_int_const_binop (enum tree_code code, tree val1, tree val2) { /* ... */ else if (TREE_OVERFLOW (res) && !TREE_OVERFLOW (val1) && !TREE_OVERFLOW (val2)) { /* If the operation overflowed but neither VAL1 nor VAL2 are overflown, return -INF or +INF depending on the operation and the combination of signs of the operands. */ int sgn1 = tree_int_cst_sgn (val1); int sgn2 = tree_int_cst_sgn (val2); /* Notice that we only need to handle the restricted set of operations handled by extract_range_from_binary_expr. Among them, only multiplication, addition and subtraction can yield overflow without overflown operands because we are working with integral types only... except in the case VAL1 = -INF and VAL2 = -1 which overflows to +INF for division too. */ /* For multiplication, the sign of the overflow is given by the comparison of the signs of the operands. */ if ((code == MULT_EXPR && sgn1 == sgn2) /* For addition, the operands must be of the same sign to yield an overflow. Its sign is therefore that of one of the operands, for example the first. */ || (code == PLUS_EXPR && sgn1 > 0) /* For subtraction, the operands must be of different signs to yield an overflow. Its sign is therefore that of the first operand or the opposite of that of the second operand. A first operand of 0 counts as positive here, for the corner case 0 - (-INF), which overflows, but must yield +INF. */ || (code == MINUS_EXPR && sgn1 >= 0) /* For division, the only case is -INF / -1 = +INF. */ || code == TRUNC_DIV_EXPR || code == FLOOR_DIV_EXPR || code == CEIL_DIV_EXPR || code == EXACT_DIV_EXPR || code == ROUND_DIV_EXPR) return TYPE_MAX_VALUE (TREE_TYPE (res)); else return TYPE_MIN_VALUE (TREE_TYPE (res)); } /* ... */ } What would you suggest this function to do, based on your comments? The function should be looked at in the context of the few callers - this is really one of the more ugly and tricky parts of VRP. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 31 Dec 2006 00:10:23 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: "Richard Guenther" <[EMAIL PROTECTED]> writes: | On 30 Dec 2006 23:55:46 +0100, Gabriel Dos Reis | <[EMAIL PROTECTED]> wrote: | >/* Wrapper around int_const_binop. If the operation overflows and we | > are not using wrapping arithmetic, then adjust the result to be | > -INF or +INF depending on CODE, VAL1 and VAL2. */ | > | >static inline tree | >vrp_int_const_binop (enum tree_code code, tree val1, tree val2) [...] | > What would you suggest this function to do, based on your comments? | | The function should be looked at in the context of the few callers - this | is really one of the more ugly and tricky parts of VRP. I've done that; I do not see an obvious way to make everybody happy -- except issueing a warning (which I've done). That is why I was asking since you raised that particular point. Maybe VRP experts may have opinions... The heavy (and sole) user of vrp_int_const_binop() is extract_range_from_binary_expr(). Yes. I don't see a way to issue a warning there without 99% false positives there. The only thing we can really do is avoid false positives reliably if we have a + b and known ranges for a and b so we can see it will _not_ overflow. But issuing a warning only if we are sure it _will_ overflow will likely cause in no warning at all - the interesting cases would be those that will have many false positives. Note the interesting places in VRP where it assumes undefined signed overflow is in compare_values -- we use the undefinedness to fold comparisons. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 31 Dec 2006 00:40:39 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: "Richard Guenther" <[EMAIL PROTECTED]> writes: | On 31 Dec 2006 00:10:23 +0100, Gabriel Dos Reis | <[EMAIL PROTECTED]> wrote: | > "Richard Guenther" <[EMAIL PROTECTED]> writes: | > | > | On 30 Dec 2006 23:55:46 +0100, Gabriel Dos Reis | > | <[EMAIL PROTECTED]> wrote: | > | >/* Wrapper around int_const_binop. If the operation overflows and we | > | > are not using wrapping arithmetic, then adjust the result to be | > | > -INF or +INF depending on CODE, VAL1 and VAL2. */ | > | > | > | >static inline tree | > | >vrp_int_const_binop (enum tree_code code, tree val1, tree val2) | > | > [...] | > | > | > What would you suggest this function to do, based on your comments? | > | | > | The function should be looked at in the context of the few callers - this | > | is really one of the more ugly and tricky parts of VRP. | > | > I've done that; I do not see an obvious way to make everybody happy -- | > except issueing a warning (which I've done). That is why I was asking | > since you raised that particular point. Maybe VRP experts may have | > opinions... | > | > The heavy (and sole) user of vrp_int_const_binop() is | > extract_range_from_binary_expr(). | | Yes. I don't see a way to issue a warning there without 99% false | positives there. The only thing we can really do is avoid false | positives reliably if we have a + b and known ranges for a and b | so we can see it will _not_ overflow. But issuing a warning only | if we are sure it _will_ overflow will likely cause in no warning at | all - the interesting cases would be those that will have many | false positives. for this specific function (vrp_int_const_binop), I'm issuing a warning inside the else-if branch that tests for the overflowed result. I'm unclear why that is a false positive since the result is known to overflow. Could you elaborate? Well, we use that function to do arithmetic on value ranges like for example the ranges involving the expression a + b [50, INT_MAX] + [50, 100] now you will get a warning as we use vrp_int_const_binop to add INT_MAX and 100 (to yield INT_MAX in the signed case). Of course adding a + b will not always overflow here (it might never as the INT_MAX bound might be just due to VRP deficiencies), for example 50 + 50 will not overflow. So using vrp_int_const_binop to generate the warning will yield very many false positives (also due to the fact that if we only know the lower or upper bound we have lots of INT_MAX and INT_MIN in value ranges). | Note the interesting places in VRP where it assumes undefined | signed overflow is in compare_values -- we use the undefinedness | to fold comparisons. I considered compare_values(), but I concluded that issueing a warning from there will yield too many false positive, and probably many duplicates. Is that assumption correct? That is correct - it's basically the same problem as if we were warning from inside fold. I have been looking into infer_loop_bounds_from_signedness() called from infer_loop_bounds_from_undefined(). At some places, nowrap_type_p() is used but this function operates only on types, so there will be too many false positive there; yet we will miss warning through that case. I don't know that area too well, but I think we are already issuing a warning if we use -funsafe-loop-optimizations, so it might be possible to do the same if we use signed wrapping undefinedness. Zdenek should know more. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/31/06, Richard Kenner <[EMAIL PROTECTED]> wrote: > What would you suggest this function to do, based on your comments? I'm not familiar enough with VRP to answer at that level, but at a higher level, what I'd advocate is that the *generator* of information would track things both ways, assuming wrapping and non-wrapping semantics (of course, if -fwrapv or -fno-wrapv was specified, it would only do that one). Then the *user* of the information would decide which one to use by applying heuristics based both on the likelihood that the programmer would be relying on wrapping and the importance from an optimization standpoint of not doing so. For the VRP case I'd like to rework vrp_int_const_binop to behave like int_const_binop (that wraps) and return if the operation overflowed. It's much more readable to have the handling (or not handling) of overflow at the callers site extract_range_from_binary_expression. Using TREE_OVERFLOW as present is just wasting memory as an extra return value will also do. So in the end it's probably time to re-work the core implementation of int_const_binop to be more like tree int_const_binop (enum tree_code code, tree type, tree arg1, tree arg2, int notrunc, bool *overflow); as it has all the magic to detect overflow already. So, for example, when making decisions involving loop optimizations, it would assume that bivs and givs don't overflow. But it if saw a comparison against a result that might overflow, it would assume the more conservative choice on the grounds that it's more likely that the test was intended to do something than to always be true or false. Here is where you also have a chance of issuing a warning. In the former of these cases, however, I don't think it would make sense to issue one since it'd almost always be a false positive and in the latter there is nothing to warn about since the conservative choice is being made. But in some "middle of the road" cases, assuming undefined overflow and warning might be the most appropriate: indeed, the ability to issue the warning could be a justification towards making the less conservative choice.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/31/06, Duncan Sands <[EMAIL PROTECTED]> wrote: > > for this specific function (vrp_int_const_binop), I'm issuing a > > warning inside the else-if branch that tests for the overflowed > > result. I'm unclear why that is a false positive since the result is > > known to overflow. Could you elaborate? > > Well, we use that function to do arithmetic on value ranges like > for example the ranges involving the expression a + b > > [50, INT_MAX] + [50, 100] > > now you will get a warning as we use vrp_int_const_binop to add > INT_MAX and 100 (to yield INT_MAX in the signed case). Of > course adding a + b will not always overflow here (it might never > as the INT_MAX bound might be just due to VRP deficiencies), > for example 50 + 50 will not overflow. > > So using vrp_int_const_binop to generate the warning will yield > very many false positives (also due to the fact that if we only know > the lower or upper bound we have lots of INT_MAX and INT_MIN > in value ranges). You could emit a warning if the entire range overflows (i.e. both lower and upper bound calculations overflow), since that means that the calculation of a+b necessarily overflows. Yes we can do that, but this won't detect the cases people care about. In fact I doubt it will trigger on real code at all - you can just make artificial testcases that excercise this warning. Like if (a > INT_MAX/2 && b > INT_MAX/2) return a + b; I doubt this will be very useful. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 31 Dec 2006 12:42:57 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: "Richard Guenther" <[EMAIL PROTECTED]> writes: | On 12/31/06, Richard Kenner <[EMAIL PROTECTED]> wrote: | > > What would you suggest this function to do, based on your comments? | > | > I'm not familiar enough with VRP to answer at that level, but at a higher | > level, what I'd advocate is that the *generator* of information would track | > things both ways, assuming wrapping and non-wrapping semantics (of course, if | > -fwrapv or -fno-wrapv was specified, it would only do that one). Then the | > *user* of the information would decide which one to use by applying | > heuristics based both on the likelihood that the programmer would be relying | > on wrapping and the importance from an optimization standpoint of not doing so. | | For the VRP case I'd like to rework vrp_int_const_binop to behave like | int_const_binop (that wraps) and return if the operation overflowed. It's | much more readable to have the handling (or not handling) of overflow | at the callers site extract_range_from_binary_expression. Using | TREE_OVERFLOW as present is just wasting memory as an extra return | value will also do. | | So in the end it's probably time to re-work the core implementation of | int_const_binop to be more like | | tree int_const_binop (enum tree_code code, tree type, tree arg1, tree arg2, | int notrunc, bool *overflow); | | as it has all the magic to detect overflow already. Can I interpret that as you volunteering to do it, or at least help? Yes, I have some patches in the queue to clean this up (and add some more stuff to VRP). Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/31/06, Daniel Berlin <[EMAIL PROTECTED]> wrote: On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote: > "Steven Bosscher" <[EMAIL PROTECTED]> writes: > > > On 12/31/06, Paul Eggert <[EMAIL PROTECTED]> wrote: > >> Also, as I understand it this change shouldn't affect gcc's > >> SPEC benchmark scores, since they're typically done with -O3 > >> or better. > > > > It's not all about benchmark scores. > > But so far, benchmark scores are the only scores given by the people > who oppose having -O2 imply -fwrapv. If the benchmarks use -O3 they > wouldn't be affected by such a change -- and if so, we have zero hard > evidence of any real harm being caused by having -O2 imply -fwrapv. > > > I think most users compile at -O2 > > Yes, which is why there's so much argument about what -O2 should do > > > You say you doubt it affects performance. Based on what? Facts > > please, not guesses and hand-waiving... > > The burden of proof ought to be on the guys proposing -O2 > optimizations that break longstanding code, not on the skeptics. The burden ought to be (and IMHO is) on those who propose we change optimizer behavior in order to support something non-standard. Why do you believe otherwise? > > That being said, I just compiled GNU coreutils CVS on a Debian stable > x86 (2.4 GHz Pentium 4) using GCC 4.1.1. With -O0, "sha512sum" on the > coreutils tar.gz file took 0.94 user CPU seconds (measured by "time > src/sha512sum coreutils-6.7-dirty.tar.gz"). With -O2 -fwrapv, 0.87 > seconds. With plain -O2, 0.86 seconds. > > I also tried gzip 1.3.10, compressing its own tar file with a -9 > compression option. With -O0, 0.30 user CPU seconds. With -O2 > -fwrapv, 0.24 seconds. With -O2, 0.24 seconds. > > In all these cases I've averaged several results. The difference > between -O2 and -O2 -fwrapv is pretty much in the noise here. > > Admittedly it's only two small tests, and it's with 4.1.1. But that's > two more tests than the -fwrapv naysayers have done, on > bread-and-butter applications like coreutils or gzip or Emacs (or GCC > itself, for that matter). These are not performance needing applications. I'll happily grant you that adding -fwrapv will make no difference at all on any application that does not demand performance in integer or floating point calculations. I added -fwrapv to the Dec30 run of SPEC at http://www.suse.de/~gcctest/SPEC/CFP/sb-vangelis-head-64/recent.html and http://www.suse.de/~gcctest/SPEC/CINT/sb-vangelis-head-64/recent.html Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/31/06, Robert Dewar <[EMAIL PROTECTED]> wrote: Paul Eggert wrote: > The question is not whether GCC should support wrapv > semantics; it already does, if you specify -fwrapv. > The question is merely whether wrapv should be the default > with optimization levels -O0 through -O2. That over simplifies, because it presents things as though there are only two possibilities 1. Allow "undefined" optimizations without restriction 2. Forbid all such optimizations by specifying fwrapv Quite a few of us (including certainly me, and I think Richard) argue for a mid ground where We don't insist on full fwrapv semantics (because we think it will hurt code equality, particularly in the loop invariant case). We are judcious in applying the optimization in other cases in a pragmatic attempt to keep "traditional" C code working in practice. I think this is a fragile and not very practical approach. How do you define these "traditional" cases? I guess you would keep the other two as well, so have the mid ground default, -fno-wrapv do what we have now and -fwrapv disable all the optimizations. I'd rather enable -fwrapv at -O1 and keep -O2 as is. This is what we also do for strict-aliasing, it's enabled at -O2 and beyond only (for C, that is). Of course with -O1 the VRP pass is not run, so it might be that the reported problems do not show up with -O1 - and as they are reportedly not for performance critical code they should maybe just use -O1 as autoconf default. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/31/06, Richard Kenner <[EMAIL PROTECTED]> wrote: > I think this is a fragile and not very practical approach. How do > you define these "traditional" cases? You don't need to define the "cases" in advance. Rather, you look at each place where you'd be making an optimization based on the non-existance of overflow and use knowlege of the importance of that optimization and the types of code likely to trigger it to choose the default for whether to make it or not. It seems quite practical to me. It also doesn't seem fragile: if you guess wrong on one particular default, it's easy to change it. Are you volunteering to audit the present cases and argue whether they fall in the "traditional" cases? Note that -fwrapv also _enables_ some transformations on signed integers that are disabled otherwise. We for example constant fold -CST for -fwrapv while we do not if signed overflow is undefined. Would you change those? Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/31/06, Richard Kenner <[EMAIL PROTECTED]> wrote: > Are you volunteering to audit the present cases and argue whether they > fall in the "traditional" cases? I'm certainly willing to *help*, but I'm sure there will be some cases that will require discussion to get a consensus. > Note that -fwrapv also _enables_ some transformations on signed > integers that are disabled otherwise. We for example constant fold > -CST for -fwrapv while we do not if signed overflow is undefined. > Would you change those? I don't understand the rationale for not wrapping constant folding when signed overflow is undefined: what's the harm in "defining" it as wrapping for that purpose? If it's undefined, then why does it matter what we fold it to? So we might as well fold it to what traditional code expects. The reason is PR27116 (and others, see the difficulties in fixing PR27132). We cannot do both, assume wrapping and undefined behavior during foldings at the same time - this leads to wrong-code. Citing from a message from myself: "Other than that I'm a bit nervous if we both introduce signed overflow because it is undefined and at the same time pretend it doesn't happen because it is undefined. Like given a - INT_MIN < a -> a + INT_MIN < a -> INT_MIN < 0 which is true, even for a == INT_MIN for which the original expression didn't contain an overflow. I.e. the following aborts #include extern void abort(void); int foo(int a) { return a - INT_MIN < a; } int main() { if (foo(INT_MIN)) abort (); return 0; } because we fold the comparison to 1." This was while trying to implement a - -1 to a + 1 folding. The problematic folding that existed at that point was that negate_expr_p said it would happily negate INT_MIN (to INT_MIN with overflow flag set), which is wrong in this context. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 1/1/07, Richard Kenner <[EMAIL PROTECTED]> wrote: > the seemingly prevalent attitude "but it is undefined; but it is not > C" is the opinion of the majority of middle-end maintainers. Does anybody DISAGREE with that "attitude"? It isn't valid C to assume that signed overflow wraps. I've heard nobody argue that it is. The question is how far we go in supporting existing code that's broken in this way. I don't disagree with that attitude, I even strongly agree. We support broken code by options like -fno-strict-aliasing and -fwrapv. I see this discussion as a way to prioritize work we need to do anyway: annotate operations with their overflow behavior (like creating new tree codes WRAPPING_PLUS_EXPR), clean up existing code to make it more obvious where rely on what semantics, add more testcases for corner-cases and document existing (standard-conformant) behavior more explicitly. Note that we had/have a similar discussion like this (what's a valid/useful optimization in the users perspective) on the IEEE math front - see the hunge thread about -ffast-math, -funsafe-math-optimizations and the proposal to split it into -fassociative-math and -freciprocal-math. We also have infrastructure work to do there, like laying grounds to implement proper contraction support. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 1/1/07, Geert Bosch <[EMAIL PROTECTED]> wrote: On Dec 31, 2006, at 19:13, Daniel Berlin wrote: > Note the distinct drop in performance across almost all the benchmarks > on Dec 30, including popular programs like bzip2 and gzip. Not so. To my eyes, the specint 2000 mean went UP by about 1% for the base -O3 compilation. The peak enabled more unrolling, which is helped by additional range information provided by absence of -frwapv. So, I'd say this run would suggest enabling -fwrapv for at least -O1 and -O2. Also, note that we never have focussed on performance with -fwrapv, and it is quite likely there is quite some improvement possible. I'd really like using -fwrapv by default for -O, -O[s12]. The benefit of many programs moving from "undefined semantics" to "implementation-defined semantics, overflow wraps like in old compilers" far outweighs even an average performance loss of 2% as seen in specfp. I would support the proposal to enable -fwrapv for -O[01], but not for -O2 as that is supposed to be "optimize for speed" and as -O3 is not widely used to optimize for speed (in fact it may make code slower). I'm undecided for -Os but care less about it. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 1/2/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: Daniel Berlin wrote: >> Richard Guenther added -fwrapv to the December 30 run of SPEC at >> <http://www.suse.de/~gcctest/SPEC/CFP/sb-vangelis-head-64/recent.html> >> and >> <http://www.suse.de/~gcctest/SPEC/CINT/sb-vangelis-head-64/recent.html>. >> Daniel Berlin and Geert Bosch disagreed about how to interpret >> these results; see <http://gcc.gnu.org/ml/gcc/2007-01/msg00034.html>. Thank you for pointing that out. I apologize for having missed it previously. As others have noted, one disturbing aspect of that data is that it shows that there is sometimes an inverse correlation between the base and peak flags. On the FP benchmarks, the results are mostly negative for both base and peak (with 168.wupwise the notable exception); on the integer benchmarks it's more mixed. It would be nice to have data for some other architectures: anyone have data for ARM/Itanium/MIPS/PowerPC? So, my feeling is similar to what Daniel expresses below, and what I think Ian has also said: let's disable the assumption about signed overflow not wrapping for VRP, but leave it in place for loop analysis. Especially given: >> We don't have an exhaustive survey, but of the few samples I've >> sent in most of code is in explicit overflow tests. However, this >> could be an artifact of the way I searched for wrapv-dependence >> (basically, I grep for "overflow" in the source code). The >> remaining code depended on -INT_MIN evaluating to INT_MIN. The >> troublesome case that started this thread was an explicit overflow >> test that also acted as a loop bound (which is partly what caused >> the problem). it sounds like that would eliminate most of the problem. Certainly, making -INT_MIN evaluate to INT_MIN, when expressed like that, is an easy thing to do; that's just a guarantee about constant folding. There's no reason for us not to document that signed arithmetic wraps when folding constants, since we're going to fold the constant to *something*, and we may as well pick that answer. I don't even necessarily think we need to change our user documentation. We can just choose to make the compiler not make this assumption for VRP, and to implement folding as two's-complement arithmetic, and go on with life. In practice, we probably won't "miscompile" many non-conforming programs, and we probably won't miss two many useful optimization opportunities. Perhaps Richard G. would be so kind as to turn this off in VRP, and rerun SPEC with that change? I can do this. What I also will do is improve VRP to still fold comparisons of the for a - 10 > 20 when it knows there is no overflow due to available range information for a (it doesn't do that right now). That might eliminate most of the bad effects of turning on -fwrapv for VRP. The question is if we want to assume wrapping semantics for signed ints in VRP then or if we just not want to assume anything if signed ints happen to wrap? So, given 'a + 10' with a value range for a being [0, +INF], what should be the resulting value range? At the moment with -fno-wrapv we get [10, +INF], with wrapping semantics we could get ~[-INF+10, 9] (which includes the range we get with -fno-wrapv). (I'll do this as I return from yet another short vacation - maybe someone can beat me on producing the SPEC numbers, a s/flag_wrapv/1/ in VRP should do it for all places that do not simply call into fold). Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 1/2/07, Robert Dewar <[EMAIL PROTECTED]> wrote: Richard Guenther wrote: > On 1/1/07, Geert Bosch <[EMAIL PROTECTED]> wrote: specfp. > > I would support the proposal to enable -fwrapv for -O[01], but > not for -O2 as that is supposed to be "optimize for speed" and > as -O3 is not widely used to optimize for speed (in fact it may > make code slower). I'm undecided for -Os but care less about it. I think it is a bad idea for the optimization levels to deal with anything other than optimization. -fwrapv is not about optimization, it is about changing the language semantics. So this proposal would be tantamount to implementing a different language at -O1 and -O2, and having -O2 change the formal semantic interpretation of the program. That seems a very bad idea to me. We do that with -fstrict-aliasing, which also changes language semantics. -fstrict-aliasing is disabled for -O0 and -O1 and enabled for -O[23s]. It is one thing to have different optimization levels do different amounts of optimization that in practice may have more or less effect on non-standard programs. It is quite another to guarantee at a formal semantic level wrapping at -O1 and not -O2. If we decide to avoid some optimizations at -O1 in this area, that's fine, but it should not be done by enabling -fwrapv as one of the (presumably documented) flags included in -O1. Instead I would just do this silently without the guarantee. And I continue to favor the compromise approach where loop optimization can use undefinedness of overflow in dealing with loop invariants, but we don't by default take advantage of undefinedess elsewhere. I fear this is not as easy as it sounds, as loop optimizers (and basically every optimizer) calls back into fold to do simplifications to for example predicates it tries to prove. Unless you change this by either duplicating these parts of fold or putting logic into fold that checks current_pass == loop_optimizer you will not catch those cases. Also all of VRP, loop optimizers and fold do comparison simplification which benefits from signed overflow undefinedness the most (note that this is another area of cleanup I'm likely to touch in not too distant future). Then we have two switches: -fstandard which allows all optimizations (name can be changed, I don't care about the name) -fwrapv which changes the semantics to require wrapping in all cases (including loops) How do these switches implement your proposal? "All optimiations" sounds like the -fno-wrapv case we have now. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 1/2/07, Richard Kenner <[EMAIL PROTECTED]> wrote: > We do that with -fstrict-aliasing, which also changes language semantics. Well, yes, but not quite in the same way. Indeed it's rather hard to describe in what way it changes the language semantics but easier to describe the effect it has on optimization. I think -fwrapv is the other way around. Well, while the effect of -fstrict-aliasing is hard to describe (TBAA _is_ a complex part of the standard), -fno-strict-aliasing rules are simple. All loads and stores alias each other if they cannot be proven not to alias by points-to analysis. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 1/4/07, Richard Sandiford <[EMAIL PROTECTED]> wrote: Paul Eggert <[EMAIL PROTECTED]> writes: > Mark Mitchell <[EMAIL PROTECTED]> writes: >> it sounds like that would eliminate most of the problem. Certainly, >> making -INT_MIN evaluate to INT_MIN, when expressed like that, is an >> easy thing to do; that's just a guarantee about constant folding. > > Well, no, just to clarify: the GCC code in question actually computed > "- x", and relied on the fact that the result was INT_MIN if x (an > unknown integer) happened to be INT_MIN. Also, now that I'm thinking > about it, some the Unix v7 atoi() implementation relied on "x + 8" > evaluating to INT_MIN when x happened to be (INT_MAX - 7). These are > the usual kind of assumptions in this area. I don't know if you're implicitly only looking for certain types of signed overflow, or if this has been mentioned elsewhere (I admit I had to skim-read some of the thread) but the assumption that signed overflow is defined is _very_ pervasive in gcc at the rtl level. The operand to a CONST_INT is a signed HOST_WIDE_INT, and its accessor macro -- INTVAL -- returns a value of that type. Most arithmetic related to CONST_INTs is therefore done on signed HOST_WIDE_INTs. This means that many parts of gcc would produce wrong code if signed arithmetic saturated, for example. (FWIW, this is why I suggested adding a UINTVAL, which Stuart has since done -- thanks. However, most of gcc still uses INTVAL.) I thought all ints are unsigned in the RTL world as there is I believe no way to express "signedness" of a mode. This would have to change of course if we ever support non two's-complement arithmetic. Richard.
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 1/4/07, Richard Sandiford <[EMAIL PROTECTED]> wrote: "Richard Guenther" <[EMAIL PROTECTED]> writes: > On 1/4/07, Richard Sandiford <[EMAIL PROTECTED]> wrote: >> Paul Eggert <[EMAIL PROTECTED]> writes: >> > Mark Mitchell <[EMAIL PROTECTED]> writes: >> >> it sounds like that would eliminate most of the problem. Certainly, >> >> making -INT_MIN evaluate to INT_MIN, when expressed like that, is an >> >> easy thing to do; that's just a guarantee about constant folding. >> > >> > Well, no, just to clarify: the GCC code in question actually computed >> > "- x", and relied on the fact that the result was INT_MIN if x (an >> > unknown integer) happened to be INT_MIN. Also, now that I'm thinking >> > about it, some the Unix v7 atoi() implementation relied on "x + 8" >> > evaluating to INT_MIN when x happened to be (INT_MAX - 7). These are >> > the usual kind of assumptions in this area. >> >> I don't know if you're implicitly only looking for certain types of >> signed overflow, or if this has been mentioned elsewhere (I admit I had >> to skim-read some of the thread) but the assumption that signed overflow >> is defined is _very_ pervasive in gcc at the rtl level. The operand to >> a CONST_INT is a signed HOST_WIDE_INT, and its accessor macro -- INTVAL >> -- returns a value of that type. Most arithmetic related to CONST_INTs >> is therefore done on signed HOST_WIDE_INTs. This means that many parts >> of gcc would produce wrong code if signed arithmetic saturated, for >> example. (FWIW, this is why I suggested adding a UINTVAL, which Stuart >> has since done -- thanks. However, most of gcc still uses INTVAL.) > > I thought all ints are unsigned in the RTL world as there is I believe no way > to express "signedness" of a mode. This would have to change of course > if we ever support non two's-complement arithmetic. I'm not sure what you mean. Yes, "all ints are unsigned in the RTL world" in the sense that we must use two's complement arithmetic for them -- we have no way of distinguishing what was originally signed from what was originally unsigned. But my point was that gcc _stores_ the integers as _signed_ HOST_WIDE_INTs, and operates on them as such, even though these signed HOST_WIDE_INTs may actually represent unsigned integers. Thus a lot of the arithmetic that gcc does at the rtl level would be wrong for certain inputs if the _compiler used to build gcc_ assumed that signed overflow didn't wrap. In other words, it sounds like you took my message to mean that gcc's rtl code treated signed overflow _in the input files_ as undefined. I didn't mean that. I meant that gcc's own code relies on signed overflow being defined. I think it was instances of the latter that Paul was trying to find. Yes, I confused that. It seems that RTL should store integer consts as unsigned HOST_WIDE_INT instead - the current situation could lead to problems as you rightfully said. Richard.
Re: mpfr issues when Installing gcc 3.4 on fedora core
On 1/4/07, drizzle drizzle <[EMAIL PROTECTED]> wrote: Still no luck so far .. I got the gcc3.4 from the gcc archive. Any way I can make gcc 3.4 not use these libraries ? 3.4 doesn't use gmp or mpfr, gfortran introduces this dependency but it appears with 4.0 or newer only. Richard.
Re: mpfr issues when Installing gcc 3.4 on fedora core
On 1/4/07, drizzle drizzle <[EMAIL PROTECTED]> wrote: I configure with --enable-languages=c,c++ . Shudnt that disable gfortran ? You are not configuring gcc 3.4. Richard. thanks dz On 1/4/07, Richard Guenther <[EMAIL PROTECTED]> wrote: > On 1/4/07, drizzle drizzle <[EMAIL PROTECTED]> wrote: > > Still no luck so far .. I got the gcc3.4 from the gcc archive. Any way > > I can make gcc 3.4 not use these libraries ? > > 3.4 doesn't use gmp or mpfr, gfortran introduces this dependency but it > appears with 4.0 or newer only. > > Richard. >
Re: Build snapshots according to a more regular schedule
On 1/5/07, Joe Buck <[EMAIL PROTECTED]> wrote: On Fri, Jan 05, 2007 at 07:26:27AM -0800, Ian Lance Taylor wrote: > David Edelsohn <[EMAIL PROTECTED]> writes: > > > Are 4.0 snapshots still necessary? I suspect they should be > > discontinued. > > 4.0 still seems to be regarded as an active branch. > > I don't mind closing it, myself. Does anybody think we should have a > 4.0.4 release? I'd like to see it closed. We have some bugs that are only open because they are targeted for 4.0.4 (fixed on all branches but 4_0). If Gaby's interested in doing one final release first, I'm fine with that. But I'd like to see it go away. I'd like to see it closed, too, all Linux/BSD vendors I know of are either still using 3.x or have switched to 4.1 already. Richard.
Re: Build snapshots according to a more regular schedule
On 1/5/07, David Fang <[EMAIL PROTECTED]> wrote: > > > > Are 4.0 snapshots still necessary? I suspect they should be > > > > discontinued. > > > > > > 4.0 still seems to be regarded as an active branch. > > > > > > I don't mind closing it, myself. Does anybody think we should have a > > > 4.0.4 release? > > > > I'd like to see it closed. We have some bugs that are only open because > > they are targeted for 4.0.4 (fixed on all branches but 4_0). > > I'd like to see it closed, too, all Linux/BSD vendors I know of are either > still using 3.x or have switched to 4.1 already. Hi, User chiming in: before retiring 4.0, one would be more easily convinced to make a transition to 4.1+ if the regressions from 4.0 to 4.1 numbered fewer. In the database, I see only 79 (P3+) regressions in 4.1 that are not in 4.0 (using only summary matching). Will these get a bit more attention for the upcoming 4.1.2 release? Well, I certainly see more attention on 4.1 regressions than on 4.0 regressions, but as 4.0 almost get's no attention there is no hint that retiring 4.0 will magically free resources to tackle more 4.1 problems. Richard. http://gcc.gnu.org/bugzilla/query.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=4.1&known_to_fail_type=allwordssubstr&known_to_work_type=allwordssubstr&long_desc_type=allwordssubstr&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&gcchost_type=allwordssubstr&gcchost=&gcctarget_type=allwordssubstr&gcctarget=&gccbuild_type=allwordssubstr&gccbuild=&keywords_type=allwords&keywords=&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&priority=P1&priority=P2&priority=P3&emailtype1=substring&email1=&emailtype2=substring&email2=&bugidtype=include&bug_id=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&query_based_on=4.1%20%5C%204.0%20regressions&negate0=1&field0-0-0=short_desc&type0-0-0=substring&value0-0-0=4.0&field0-1-0=noop&type0-1-0=noop&value0-1-0= Fang
Re: Build problem with gcc 4.3.0 20070108 (experimental)
On 1/8/07, George R Goffe <[EMAIL PROTECTED]> wrote: /tools/gcc/obj-i686-pc-linux-gnu/./prev-gcc/xgcc -B/tools/gcc/obj-i686-pc-linux-gnu/./prev-gcc/ -B/usr/lsd/Linux/x86_64-unknown-linux-gnu/bin/ -c -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition -Wmissing-format-attribute -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libdecnumber -I../libdecnumber ../../gcc/gcc/tree-vectorizer.c -o tree-vectorizer.o cc1: warnings being treated as errors ../../gcc/gcc/tree-vectorizer.c:2267: warning: initialization from incompatible pointer type make[3]: *** [tree-vectorizer.o] Error 1 make[3]: *** Waiting for unfinished jobs rm gcjh.pod gcj-dbtool.pod grmiregistry.pod fsf-funding.pod jcf-dump.pod jv-convert.pod grmic.pod gcov.pod gcj.pod gfdl.pod jv-scan.pod cpp.pod gjnih.pod gij.pod gpl.pod gfortran.pod gcc.pod make[3]: Leaving directory `/rb.exphome/tools/gcc/obj-i686-pc-linux-gnu/gcc' make[2]: *** [all-stage2-gcc] Error 2 make[2]: Leaving directory `/rb.exphome/tools/gcc/obj-i686-pc-linux-gnu' make[1]: *** [stage2-bubble] Error 2 make[1]: Leaving directory `/rb.exphome/tools/gcc/obj-i686-pc-linux-gnu' make: *** [bootstrap] Error 2 That's honza's patch - but bootstrap doesn't abort for me at that point. Richard.
Re: CSE not combining equivalent expressions.
On 1/15/07, pranav bhandarkar <[EMAIL PROTECTED]> wrote: Hello Everyone, I have the following source code static int i; static char a; char foo_gen(int); void foo_assert(char); void foo () { int *x = &i; a = foo_gen(0); a |= 1; /* 1-*/ if (*x) goto end: a | =1; /* -2--*/ foo_assert(a); end: return; } Now I expect the CSE pass to realise that 1 and 2 are equal and eliminate 2. However the RTL code before the first CSE passthe RTL snippet is as follows (insn 11 9 12 0 (set (reg:SI 1 $c1) (const_int 0 [0x0])) 43 {*movsi} (nil) (nil)) (call_insn 12 11 13 0 (parallel [ (set (reg:SI 1 $c1) (call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41] ) [0 S4 A32]) (const_int 0 [0x0]))) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 39 {*call_value_direct} (nil) (nil) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1)) (nil))) (insn 13 12 14 0 (set (reg:SI 137) (reg:SI 1 $c1)) 43 {*movsi} (nil) (nil)) (insn 14 13 16 0 (set (reg:SI 135 [ D.1217 ]) (reg:SI 137)) 43 {*movsi} (nil) (nil)) (insn 16 14 17 0 (set (reg:SI 138) (ior:SI (reg:SI 135 [ D.1217 ]) (const_int 1 [0x1]))) 63 {iorsi3} (nil) (nil)) (insn 17 16 18 0 (set (reg:SI 134 [ D.1219 ]) (zero_extend:SI (subreg:QI (reg:SI 138) 0))) 84 {zero_extendqisi2} (nil) (nil)) (insn 18 17 19 0 (set (reg/f:SI 139) (symbol_ref:SI ("a") [flags 0x2] )) 43 {*movsi} (nil) (nil)) (insn 19 18 21 0 (set (mem/c/i:QI (reg/f:SI 139) [0 a+0 S1 A8]) (subreg/s/u:QI (reg:SI 134 [ D.1219 ]) 0)) 56 {*movqi} (nil) (nil)) < expansion of the if condition ;; End of basic block 0, registers live: (nil) ;; Start of basic block 1, registers live: (nil) (note 25 23 27 1 [bb 1] NOTE_INSN_BASIC_BLOCK) (insn 27 25 28 1 (set (reg:SI 142) (ior:SI (reg:SI 134 [ D.1219 ]) (const_int 1 [0x1]))) 63 {iorsi3} (nil) (nil)) (insn 28 27 29 1 (set (reg:SI 133 [ temp.28 ]) (zero_extend:SI (subreg:QI (reg:SI 142) 0))) 84 {zero_extendqisi2} (nil) (nil)) (insn 29 28 30 1 (set (reg/f:SI 143) (symbol_ref:SI ("a") [flags 0x2] )) 43 {*movsi} (nil) (nil)) (insn 30 29 32 1 (set (mem/c/i:QI (reg/f:SI 143) [0 a+0 S1 A8]) (subreg/s/u:QI (reg:SI 133 [ temp.28 ]) 0)) 56 {*movqi} (nil) (nil)) Now the problem is that the CSE pass doesnt identify that the source of the set in insn 27 is equivalent to the source of the set in insn 16. This It seems happens because of the zero_extend in insn 17. I am using a 4.1 toolchain. However with a 3.4.6 toolchain no zero_extend gets generated and the result of the ior operation is immediately copied into memory. I am compiling this case with -O3. Can anybody please tell me how this problem can be overcome. CSE/FRE or VRP do not track bit operations and CSE of bits. To overcome this you need to implement such. Richard.
Re: -Wconversion versus libstdc++
On 17 Jan 2007 16:36:04 -0600, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: Paolo Carlini <[EMAIL PROTECTED]> writes: | Joe Buck wrote: | | >In the case of the containers, we are asserting/relying on the fact that | >the pointer difference is zero or positive. But this has become a | >widespread idiom: people write their own code in the STL style. If STL | >code now has to be fixed to silence warnings, so will a lot of user code. | > | Good point. About it, we should also take into account the recent | messages from Martin, pointing out that many C++ front-ends do not | warn for signed -> unsigned. I just built firefox (CVS) with GCC mainline. The compiler spitted avalanches of non-sensical warning about conversions signed -> unsigned may alter values, when in fact the compiler knows that such things cannot happen. First, let's recall that GCC supports only 2s complement targets. Second, a conversion from T to U may alter value if a round trip is not the identity function. That is, there exists a value t in T such that the assertion assert (T(U(t)) == t) fails. I think it warns if U(t) != t in a mathematical sense (without promoting to the same type for the comparison), so it warns as (unsigned)-1 is not "-1". I agree this warning is of questionable use and should be not enabled with -Wall. Richard.
Re: gcc compile time support for assumptions
On 18 Jan 2007 07:51:51 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: Andrew Haley <[EMAIL PROTECTED]> writes: > Ian Lance Taylor writes: > > Abramo Bagnara <[EMAIL PROTECTED]> writes: > > > > > I'd like to know if gcc has implemented some generic way to help > > > optimizer job by allowing programmers to specify assumptions (or > > > constraints). > > > > The answer is no, there is nothing quite like you describe. > > > > But I think it would be a good idea. > > Something like this would greatly improve the code generation quality > of gcj. There are a great many assertions that I could pass to VRP > and the optimizers: this is invariant, this is less than that, and so > on. Well, internally, we do have ASSERT_EXPR. It would probably take a little work to permit the frontends to generate it, but the optimizers should understand it. Providing a __builtin_assert () function is still one thing on my TODO, we can derive proper ASSERT_EXPRs from it in VRP even in the -DNDEBUG case. Of course if the asserts still end up in the source VRP already can derive information from the IL of the assert code. Richard.
Re: gcc compile time support for assumptions
On 1/18/07, Robert Dewar <[EMAIL PROTECTED]> wrote: Andrew Haley wrote: > Ian Lance Taylor writes: > > Abramo Bagnara <[EMAIL PROTECTED]> writes: > > > > > I'd like to know if gcc has implemented some generic way to help > > > optimizer job by allowing programmers to specify assumptions (or > > > constraints). > > > > The answer is no, there is nothing quite like you describe. > > > > But I think it would be a good idea. > > Something like this would greatly improve the code generation quality > of gcj. There are a great many assertions that I could pass to VRP > and the optimizers: this is invariant, this is less than that, and so > on. Note that such assertions also can function as predicates to be discharged in a proof engine. See work on SPARK (www.www.praxis-his.com). One thing to consider here is whether to implement just a simple assume as proposed, or a complete mechanism for pre and post assertions (in particular, allowing you to talk about old values), then it can serve as a) a mechanism for programming by contract b) a mechanism for interacting with proof tools c) a mechanism to improve generated code as suggested in this thread all at the same time Eiffel should be examined for inspiration on such assertions. Sure. I only was thinking about the interface to the middle-end where new builtin functions are the easiest way to provide assumptions. If you have further ideas they are certainly appreciated (without me being required to research about Eiffel ...). Thanks, Richard.
Re: gcc compile time support for assumptions
On 18 Jan 2007 10:19:37 -0600, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: Paolo Carlini <[EMAIL PROTECTED]> writes: | Richard Guenther wrote: | | > Providing a __builtin_assert () function is still one thing on my | > TODO, we can | > derive proper ASSERT_EXPRs from it in VRP even in the -DNDEBUG case. | | Great! Certainly could be profitably used in libstdc++. Indeed! We would just make sure that people don't think there is a relation between __builtin_assert() and assert(), and there usually are between __builtin_xxx() and xxx(). Ok, so better name it __builtin_assume () then. Richard.
Re: Miscompilation of remainder expressions
On 1/19/07, Joe Buck <[EMAIL PROTECTED]> wrote: On Thu, Jan 18, 2007 at 05:36:23PM -0500, Robert Dewar wrote: > Morten Welinder wrote: > >>For sure a/b is undefined > > > >In C, it is. In assembler it is perfectly well defined, i.e., it > >traps. But how is the > >trap handler supposed to know the source of a given instruction? > > > >M. > > Sure it traps, but what happens when that trap occurs is of course > O/S dependent, there is no guarantee of the effect. We're going around in circles. Everything has been said. What if we put together a Wiki page with the problem and possible solutions, as in a) do nothing, no big deal. This is what happens if no one contributes code in any case. b) add a flag to generate code to compute something like divisor == -1 ? 0 : rem(dividend,divisor) or divisor == rem(dividend,abs(divisor)) in an efficient way (cmov to avoid busting the pipeline) Note that cmov is not the right thing to use here (at least on ia32 and x86_64) as it is more expensive than compare and jump if not the probability of taking either way are the same (which I would not expect here, as divisor == -1 should be nearly never the case). That said, a transformation on the tree level would make it possible for VRP to optimize away the special casing. Richard.
Re: [RFC] Our release cycles are getting longer
On 1/24/07, David Carlton <[EMAIL PROTECTED]> wrote: On Tue, 23 Jan 2007 17:54:10 -0500, Diego Novillo <[EMAIL PROTECTED]> said: > So, I was doing some archeology on past releases and we seem to be > getting into longer release cycles. Interesting. I'm a GCC observer, not a participant, but here are some thoughts: As far as I can tell, it looks to me like there's a vicious cycle going on. Picking an arbitrary starting point: 1) Because lots of bugs are introduced during stage 1 (and stage 2), stage 3 takes a long time. 2) Because stage 3 takes a long time, development branches are long-lived. (After all, development branches are the only way to do work during stage 3.) 3) Because development branches are long-lived, the stage 1 merges involve a lot of code. 4) Because the stage 1 merges involve a lot of code, lots of bugs are introduced during stage 1. (After all, code changes come with bugs, and large code changes come with lots of bugs.) 1) Because lots of bugs are introduced during stage 1, stage 3 takes a long time. Now, the good news is that this cycle can be a virtuous cycle rather than a vicious cycle: if you can lower one of these measurements (length of stage 3, size of branches, size of patches, number of bugs), then the other measurements will start going down. "All" you have to do is find a way to mute one of the links somehow, focus on the measurement at the end of that link, and then things will start getting better. Indeed this is a good observation. If you look at how the linux-kernel development changed to face and fix this problem and map this to gcc development we would have like 1. a two-week stage1 for merging 2. six weeks for bugfixing 3. a release 1. ... note that we would not have maintained FSF release branches, but if time is right vendors would pick up a release and keep it maintained (semi-officially). This would allow focusing on "interesting" releases and avoid piling up too much development on branches, at the same time we can enforce more strict quality rules on merges, because missing one merge window is not delaying the merge for one year (as now), but only two month. Note that the above model would basically force all development on non-minor stuff to happen on branches, "working" on the mainline is not possible in this scheme. (This of course fits the distributed development model of the linux kernel more) Of course the two-week and six-week numbers are not fixed, but keeping both numbers low encourages good quality work (and keeping the first number low is a requirement anyway). We'd have a lot of releases of course - but that's only a naming problem. Richard.
Re: [RFC] Our release cycles are getting longer
On 1/25/07, Steven Bosscher <[EMAIL PROTECTED]> wrote: On 1/25/07, H. J. Lu <[EMAIL PROTECTED]> wrote: > > >Gcc 4.2 has a serious FP performace issue: > > > > > >http://gcc.gnu.org/ml/gcc/2007-01/msg00408.html > > > > > >on both ia32 and x86-64. If there will be a 4.2.0 release, I hope it > > >will be addressed. > > > > As always, the best way to ensure that it is addressed if it is > > important to you is to address it yourself, or pay someone to do so :-) > > The fix is in mainline. The question is if it should be backported to > 4.2. ISTR Dan already made it clear more than once that the answer to that question is a loud NO. I thought it was more like "if you really want it I can do it". And I think without it 4.2 sucks. Richard.
Re: GCC-4.0.4 release status
On 25 Jan 2007 10:29:27 -0600, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: Hi, There were over 250 PRs open against GCC-4.0.4. Almost all of them are "benign" in the sense that we can leave without fixing them in GCC-4.0.4 -- many are already fixed in more recent versions. I'm now giving attention only to those PRs marked as blocker or critical. I've identified three: tree-optimization/29605 * Wrong code generation when optimization is one. Andrew Pinski believes it is "Out of SSA" doing it. This touches middle-end, and will be left unfixed -- unless the fix is really really trivial. tree-optimization/28778 * not to be fixed in GCC-4.0.x middle-end/28683 * has a "simple" patch applied to GCC-4.2.x and GCC-4.1.x. I'm considering applying to GCC-4.0.4. You might want to consider middle-end/28651 given the recent integer overflow discussions. I can do the backport work if you like. Richard.
Re: GCC-4.0.4 release status
On 1/25/07, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: On Thu, 25 Jan 2007, Richard Guenther wrote: | You might want to consider middle-end/28651 given the recent integer overflow | discussions. Good suggestion! | I can do the backport work if you like. If you could work out a backport by tomorrow morning (CST), that would would great. Done. Bootstrapped and tested on x86_64-unknown-linux-gnu. Richard.
Re: GCC-4.0.4 release status
On 1/25/07, Volker Reichelt <[EMAIL PROTECTED]> wrote: Hi, > | Well, there's another serious (wrong-code) bug which should be fixed IMHO: > | > | PR c++/29106 is a C++ front-end issue. > | It has a one-line fix (plus comment) on the 4.1 branch. > | Well, actually one should also backport the fix for PR c++/28284 then, > | which is a two-liner. > I was primarily looking at the PRs that marked in the bugzilla > database blocker or critical. As there were over 256 PRs open, and > the idea is to get GCC-4.0.4 out of the door as soon as possible, I'm > not trying to fix everything; just those that are critical or > blockers. This is based on the fact that most distros have moved to > GCC-4.1.x or higher. GCC-4.0.x has been since GCC-4.0.0 to contain > major shortcomings. Well, the severity status of the bugs is not very well maintained. Mark e.g. only sets the prioriy field (P1 - P5) of the bugs. And PR 29106 bug is one of the 37 P1 bugs. And one from three wrong-code P1 bugs. So this is not like every simple error-recovery problem. In addition this is a regression from GCC 4.0.2, i.e. a regression on the 4.0 branch. Which makes this bug even worse, IMHO. (This infromation seems to be missing in bugzilla, though.) Considering how much dispute there is on the mailing list about how to handle undefined behaviour correctly ;-), it bothers me more that we ignore one-lines fixes for wrong-code bugs. I think regressions on the branch are worth a fix. Though I agree that the primary goal should be to get rid of the 4.0 branch ;) Richard.
Re: Which optimization levels affect gimple?
On 1/26/07, Diego Novillo <[EMAIL PROTECTED]> wrote: Paulo J. Matos wrote on 01/26/07 06:52: > Is the output of -fdump-tree-optimized a subset of GIMPLE? > Yes. The output is an incomplete textual representation of the GIMPLE form of the program. It's after doing TER, so the statements are no longer valid GIMPLE statements. Richard.
Re: Signed int overflow behavior in the security context
On 1/27/07, Paul Schlie <[EMAIL PROTECTED]> wrote: >> On Fri, Jan 26, 2007 at 06:57:43PM -0500, Paul Schlie wrote: > Likewise, if the program has an uninitialized variable, the behavior > will differ depending on details of optimization and how variables are > assigned to memory. Heap allocated for the first time might always be > zero (as the OS delivers it that way), turning on optimization might then > result in a nonzero initial value because of reuse of a register. - I would argue that in this circumstance although the resulting value may differ, the results are actually equivalent; as in both circumstances the value returned is the value associated with it's storage location; and as the value of all storage locations are arbitrary unless otherwise well specified; the result may change from run to run regardless of any applied optimizations. If you read from an uninitialized variable twice you might as well get a different result each time. This is exactly the same issue than with signed overflow and observable behavior - though as somebody notes later - the uninitialized variable case doesn't stir up too many peoples mind. Richard.
Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!
On 1/28/07, Steven Bosscher <[EMAIL PROTECTED]> wrote: Hello rth, Can you explain what went through your mind when you picked the tree_exp.complexity field for something implemented new... :-( Not much... http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01117.html "... but my brain isn't working anymore." ;) Richard.
Re: remarks about g++ 4.3 and some comparison to msvc & icc on ia32
On 1/28/07, tbp <[EMAIL PROTECTED]> wrote: Let it be clear from the start this is a potshot and while those trends aren't exactly new or specific to my code, i haven't tried to provide anything but specific data from one of my app, on win32/cygwin. Primo, gcc getting much better wrt inling exacerbates the fact that it's not as good as other compilers at shrinking the stack frame size, and perhaps as was suggested by Uros when discussing that point a pass to address that would make sense. As i'm too lazy to properly measure cruft across multiple compilers, i'll use my rtrt app where i mostly control large scale inlining by hand. objdump -wdrfC --no-show-raw-insn $1|perl -pe 's/^\s+\w+:\s+//'|perl -ne 'printf "%4d\n", hex($1) if /sub\s+\$(0x\w+),%esp/'|sort -r| head -n 10 msvc:2196 2100 1772 1692 1688 1444 1428 1312 1308 1160 icc: 2412 2280 2172 2044 1928 1848 1820 1588 1428 1396 gcc: 2604 2596 2412 2076 2028 1932 1900 1756 1720 1132 It would have been nice to tell us what the particular columns in this table mean - now we have to decrypt objdump params and perl postprocessing ourselves. (If you are interested in stack size related to inlining you may want to tune --param large-stack-frame and --param large-stack-frame-growth). Richard.
Re: gcc 4.1.1: char *p = "str" puts "str" into rodata
On 1/28/07, Denis Vlasenko <[EMAIL PROTECTED]> wrote: char p; int main() { p = ""; return 0; } Don't you think that "" should end up in rw data? Why? It's not writable after all. Richard.
Re: remarks about g++ 4.3 and some comparison to msvc & icc on ia32
On 1/28/07, Jan Hubicka <[EMAIL PROTECTED]> wrote: > tbp wrote: > > > Secundo, while i very much appreciate the brand new string ops, it > > seems that on ia32 some array initialization cases where left out, > > hence i still see oodles of 'movl $0x0' when generating code for k8. > > Also those zeroings get coalesced at the top of functions on ia32, and > > i have a function where there's 3 pages of those right after prologue. > > See the attached 'grep 'movl $0x0' dump. > > It looks like Jan and Richard have answered some of your questions about > inlining (or are in the process of doing so), but I haven't seen a > response to this point. > > Certainly, if we're generating zillions of zero-initializations to > contiguous memory, rather than using memset, or an inline loop, that > seems unfortunate. Would you please file a bug report? I though the comment was more reffering to fact that we will happily generate movl $0x0, place1 movl $0x0, place2 ... movl $0x0, placeMillion rather than shorter xor %eax, %eax movl %eax, ... but indeed both of those issues should be addressed (and it would be interesting to know where we fail ty synthetize memset in real scenarios). With the repeated mov issue unforutnately I don't know what would be the best place: we obviously don't want to constrain register allocation too much and after regalloc I guess only machine dependent pass is the hope that is pretty ugly (but not that difiuclt to code at least at local level). One source of these patterns is SRA decomposing structures and initialization. But the structure size we do that for is limited (I also believe we already have bugreports about this, but cannot find them right now). Richard.
Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!
On 1/29/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: Steven Bosscher wrote: > On 1/29/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: >> Email is a tricky thing. I've learned -- the hard way -- that it's best >> to put a smiley on jokes, because otherwise people can't always tell >> that they're jokes. > > I did use a smiley. > > Maybe I should put the smiley smiling then, instead of a sad looking > smlley. To me, they do mean very different things. The sad smiley didn't say "joke"; it said "boo!". I think that a happy smiley would help. Well, a "joke" smiley on the first sentence Can you explain what went through your mind when you picked the tree_exp.complexity field for something implemented new... :-( would have been inappropriate. There is no joke in this sentence, there is disappointment in it. Like I say, I've had exactly the same problem with my own humor-impaired recipients. So, I think it's best just to live in constant fear that people don't think things are funny. :-) There's a later ;) simley in the mail and maybe you missed one after the second paragraph. (certainly you did) I don't read that mail as anywhere insulting or inappropriate, maybe too "informal" for this list (!?). I appreciate Stevens contributions and willingness to fix things in GCC that appearantly nobody else wants to tackle. (I and others owe you beer for that!) Thanks, Richard.
Re: G++ OpenMP implementation uses TREE_COMPLEXITY?!?!
On 1/29/07, Eric Botcazou <[EMAIL PROTECTED]> wrote: > There's a later ;) simley in the mail and maybe you missed one after > the second paragraph. (certainly you did) Then I guess the question is: what is the scope of a smiley? Does it retroactively cover all the preceding sentences, including the subject? Good point. Personally I tend to annotate sentences with (or without) smiley ;) If I want a smiley to apply to like the whole mail it would be on a separate paragraph. Annotating a complete paragraph I don't know how to do. ;) (whole-mail-smiley) Richard.
Re: Does anyone recognize this --enable-checking=all bootstrap failure?
On 1/30/07, Brooks Moses <[EMAIL PROTECTED]> wrote: I've been trying to track down an build failure that I was pretty sure came about from some patches I've been trying to merge, but I've now reduced things to a bare unmodified tree and it's still failing. I could have sworn that it wasn't doing that until I started adding things, though, so I'm posting about it here before I make a bug report so y'all can tell me if I did something dumb. Essentially, what's happening is that I've got a build tree configured with --enable-checking=all, and the first time it tries to use xgcc to compile something, it dies with an ICE. Here are the gory details: This is fold-checking tripping over bugs - this is also well-known, just don't use --enable-checking=all. (but =yes) Richard.
Re: Interesting build failure on trunk
On 2/1/07, Ismail Dönmez <[EMAIL PROTECTED]> wrote: On Wednesday 31 January 2007 11:26:38 Ismail Dönmez wrote: > On Tuesday 30 January 2007 18:43:52 Ian Lance Taylor wrote: > > Ismail Dönmez <[EMAIL PROTECTED]> writes: > > > I am getting this when I try to compile gcc trunk: > > > > > > ../../libcpp/../include -I../../libcpp/include -march=i686 -O2 -pipe > > > -fomit-frame-pointer -U_FORTIFY_SOURCE -fprofile-use -W -Wall > > > -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes > > > -Wold-style-definition -Wmissing-format-attribute -pedantic > > > -Wno-long-long -Werror -I../../libcpp -I. -I../../libcpp/../include > > > -I../../libcpp/include -c -o files.o -MT files.o -MMD -MP -MF > > > .deps/files.Po ../../libcpp/files.c ../../libcpp/files.c: In function > > > 'read_name_map': > > > ../../libcpp/files.c:1238: internal compiler error: Floating point > > > exception Please submit a full bug report, > > > with preprocessed source if appropriate. > > > See http://gcc.gnu.org/bugs.html> for instructions. > > > > libcpp/files.c:1238 seems to be a call to memcpy. I don't see > > anyplace a floating point exception might come from. I've certainly > > never seen anything like that. > > I think this is an hardware error Ok its not, I tried to build on AMD64 3500+ 1GB RAM ( unlike my Centrino laptop, totally different hardware ) and it crashes in the same exact way. now my guess is host compiler is somehow hosed , bad news for gcc 4.2 I guess. This is probably PR30650 (just don't use profiledbootstrap). Richard.
Re: build failure? (libgfortran)
On 2/5/07, Dorit Nuzman <[EMAIL PROTECTED]> wrote: Grigory Zagorodnev <[EMAIL PROTECTED]> wrote on 05/02/2007 08:18:34: > Dorit Nuzman wrote: > > I'm seeing this bootstrap failure on i686-pc-linux-gnu (revision 121579) - > > something I'm doing wrong, or is anyone else seeing this? > > Yes. I see the same at x86_64-redhat-linux. > Thanks. Turns out I see the same problem on ppc64-yellowdog-linux This is because we now fixinclude sysmacros.h and libgfortran is built with -std=gnu99. Caused by: 2007-02-03 Bruce Korb <[EMAIL PROTECTED]> * inclhack.def (glibc_c99_inline_4): replace "extern" only if surrounded by space characters. Richard.
Re: GCC 4.1.2 Status Report
On Sun, 4 Feb 2007, Mark Mitchell wrote: > [Danny, Richard G., please see below.] > > Thanks to all who have helped tested GCC 4.1.2 RC1 over the last week. > > I've reviewed the list traffic and Bugzilla. Sadly, there are a fair > number of bugs. Fortunately, most seem not to be new in 4.1.2, and > therefore I don't consider them showstoppers. > > The following issues seem to be the 4.1.1 regressions: > > http://gcc.gnu.org/wiki/GCC_4.1.2_Status > > PR 28743 is only an ICE-on-invalid, so I'm not terribly concerned. > > Daniel, 30088 is another aliasing problem. IIIRC, you've in the past > said that these were (a) hard to fix, and (b) uncommon. Is this the > same problem? If so, do you still feel that (b) is true? I'm > suspicious, and I am afraid that we need to look for a conservative hack. PR30708 popped up as well and maybe related (and easier to analyze as it's C-only). Both don't seem to be regressions on the branch, though. > Richard, 30370 has a patch, but David is concerned that we test it on > older GNU/Linux distributions, and suggested SLES9. Would you be able > to test that? The patch bootstrapped and tested ok on SLES9-ppc. > Richard, 29487 is an issue raised on HP-UX 10.10, but I'm concerned that > it may reflect a bad decision about optimization of C++ functions that > don't throw exceptions. Would you please comment? I believe reverting this patch on the branch is the only option (to fixing the HP-UX assembler). I don't know of any real-life issue it fixes. Richard. -- Richard Guenther <[EMAIL PROTECTED]> Novell / SUSE Labs
Scheduling an early complete loop unrolling pass?
Hi, currently with -ftree-vectorize we generate for for (i=0; i<3; ++i) # SFT.4346_507 = VDEF # SFT.4347_508 = VDEF # SFT.4348_509 = VDEF d[i] = 0.0; for (j=0; j:; vect_cst_.4501_723 = { 0.0, 0.0 }; vect_p.4506_724 = (vector double *) &D.76822; vect_p.4502_725 = vect_p.4506_724; # ivtmp.4508_728 = PHI <0(6), ivtmp.4508_729(11)> # ivtmp.4507_726 = PHI # ivtmp.4461_601 = PHI <3(6), ivtmp.4461_485(11)> # SFT.4348_612 = PHI # SFT.4347_611 = PHI # SFT.4346_610 = PHI # i_582 = PHI <0(6), i_118(11)> :; # SFT.4346_507 = VDEF # SFT.4347_508 = VDEF # SFT.4348_509 = VDEF *ivtmp.4507_726 = vect_cst_.4501_723; i_118 = i_582 + 1; ivtmp.4461_485 = ivtmp.4461_601 - 1; ivtmp.4507_727 = ivtmp.4507_726 + 16B; ivtmp.4508_729 = ivtmp.4508_728 + 1; if (ivtmp.4508_729 < 1) goto ; else goto ; # i_722 = PHI # ivtmp.4461_717 = PHI :; # ivtmp.4461_706 = PHI # SFT.4348_707 = PHI # SFT.4347_708 = PHI # SFT.4346_709 = PHI # i_710 = PHI :; # SFT.4346_711 = VDEF # SFT.4347_712 = VDEF # SFT.4348_713 = VDEF D.76822.D.44378.values[i_710] = 0.0; i_714 = i_710 + 1; ivtmp.4461_715 = ivtmp.4461_706 - 1; if (ivtmp.4461_715 != 0) goto ; else goto ; ... and we are later not able to do constant propagation to the second loop which we can do if we first unroll such small loops. As we also only vectorize innermost loops I believe doing a complete unrolling pass early will help in general (I pushed for this some time ago). Thoughts? Thanks, Richard. -- Richard Guenther <[EMAIL PROTECTED]> Novell / SUSE Labs
Re: Scheduling an early complete loop unrolling pass?
On Mon, 5 Feb 2007, Paolo Bonzini wrote: > > > As we also only vectorize innermost loops I believe doing a > > complete unrolling pass early will help in general (I pushed > > for this some time ago). > > > > Thoughts? > > It might also hurt, though, since we don't have a basic block vectorizer. > IIUC the vectorizer is able to turn > > for (i = 0; i < 4; i++) > v[i] = 0.0; > > into > > *(vector double *)v = (vector double){0.0, 0.0, 0.0, 0.0}; That's true. But we can not do constant propagation out of this (and the vectorizer leaves us with a lot of cruft which is only removed much later). The above case would also ask for an early vectorization pass if the loop was wrapped into another. Finding a good heuristic for which loops to completely unroll early is not easy, though for odd small numbers of iterations it is probably always profitable. Richard. -- Richard Guenther <[EMAIL PROTECTED]> Novell / SUSE Labs
Re: Scheduling an early complete loop unrolling pass?
On Mon, 5 Feb 2007, Jan Hubicka wrote: > > > > Hi, > > > > currently with -ftree-vectorize we generate for > > > > for (i=0; i<3; ++i) > > # SFT.4346_507 = VDEF > > # SFT.4347_508 = VDEF > > # SFT.4348_509 = VDEF > > d[i] = 0.0; > > Also Tomas' patch is supposed to catch this special case and convert it > into memset that should be subsequently optimized into assignment that > should be good enough (which reminds me that I forgot to merge the > memset part of stringop optimizations). > > Perhaps this can be made a bit more generic and construct INIT_EXPRs for > small arrays directly from Tomas's pass (going from memset to assignment > works just in special cases). Tomas, what is the status of your patch? It would be certainly interesting to make constant propagation work in this case (though after vectorization aliasing is in an "interesting" state). > Did you run some benchmarks? Not yet - I'm looking at the C++ SPEC 2006 benchmarks at the moment and using vectorization there seems to do a lot of collateral damage (maybe not measurable though). Richard. -- Richard Guenther <[EMAIL PROTECTED]> Novell / SUSE Labs
Re: GCC 4.1.2 Status Report
On 2/5/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: Daniel Berlin wrote: >> Daniel, 30088 is another aliasing problem. IIIRC, you've in the past >> said that these were (a) hard to fix, and (b) uncommon. Is this the >> same problem? If so, do you still feel that (b) is true? I'm >> suspicious, and I am afraid that we need to look for a conservative hack. > > It's certainly true that people will discover more and more aliasing > bugs the harder they work 4.1 :) Do you think that PR 30088 is another instance of the same problem, and that therefore turning off the pruning will fix it? Disabling pruning will also increase memory-usage and compile-time. > There is always the possibility of turning off the pruning, which will > drop our performance, but will hide most of the latent bugs we later > fixed through rewrites well enough that they can't be triggered (the > 4.1 optimizers aren't aggressive enough). Is it convenient for you (or Richard?) to measure that on SPEC? (Richard, thank you very much for stepping up to help with the various issues that I've raised for 4.1.2!) Or, have we already done so, and I've just forgotten? I'm very mindful of the import or performance, but if we think that these aliasing bugs are going to affect reasonably large amounts of code (which I'm starting to think), then shipping the compiler as is seems like a bad idea. (Yes, there's a slippery slope argument whereby we turn off all optimization, since all optimization passes may have bugs. But, if I understand correctly, the aliasing algorithm in 4.1 has relatively fundamental problems, which is rather different.) I don't think we need to go this way - there is a workaround available (use -fno-strict-aliasing) and there are not enough problems to warrant this. Richard.
Re: Scheduling an early complete loop unrolling pass?
On Tue, 6 Feb 2007, Dorit Nuzman wrote: > > Hi Richard, > > > > > ... > > However..., > > > > I have seen cases in which complete unrolling before vectorization > enabled > > constant propagation, which in turn enabled significant simplification of > > the code, thereby, in fact making a previously unvectorizable loop (at > > least on some targets, due to the presence of divisions, unsupported in > the > > vector unit), into a loop (in which the divisions were replaced with > > constants), that can be vectorized. > > > > Also, given that we are working on "SLP" kind of technology (straight > line > > code vectorization), which would make vectorization less sensitive to > > unrolling, I think maybe it's not such a bad idea after all... One option > > is to increase the default value of --param min-vect-loop-bound for now, > > and when SLP is incorporated, go ahead and schedule early complete > > unrolling. However, since SLP implementation may take some time > (hopefully > > within the time frame of 4.3 though) - we could just go ahead and > schedule > > early complete unrolling right now. (I can't believe I'm in favor of this > > idea, but that loop I was talking about before - improved by a factor > over > > 20x when early complete unrolling + subsequent vectorization were > > applied...) > > > > After sleeping on it, it actually makes a lot of sense to me to schedule > complete loop unrolling before vectorization - I think it would either > simplify loops (sometimes creating more opportunities for vectorization), > or prevent vectorization of loops we probably don't want to vectorize > anyhow, and even that - only temporarily - until we have straight-line-code > vectorization in place. So I'm all for it. Ok, I'll dig out the patches I had for this and will do some SPEC runs. As soon as I have time for this (no hurry necessary here I think). Thanks! Richard. -- Richard Guenther <[EMAIL PROTECTED]> Novell / SUSE Labs
Re: Scheduling an early complete loop unrolling pass?
On Tue, 6 Feb 2007, Dorit Nuzman wrote: > Ira Rosen/Haifa/IBM wrote on 06/02/2007 11:49:17: > > > Dorit Nuzman/Haifa/IBM wrote on 05/02/2007 21:13:40: > > > > > Richard Guenther <[EMAIL PROTECTED]> wrote on 05/02/2007 17:59:00: > > > > ... > > > > > > That's going to change once this project goes in: "(3.2) Straight- > > > line code vectorization" from http://gcc.gnu. > > > org/wiki/AutovectBranchOptimizations. In fact, I think in autovect- > > > branch, if you unroll the above loop it should get vectorized > > > already. Ira - is that really the case? > > > > The completely unrolled loop will not get vectorized because the > > code will not be inside any loop (and our SLP implementation will > > focus, at least as a first step, on loops). > > Ah, right... I wonder if we can keep the loop structure in place, even > after completely unrolling the loop - I mean the 'struct loop' in > 'current_loops' (not the actual CFG), so that the "SLP in loops" would have > a chance to at least consider vectorizing this "loop". Zdenek - what do you > say? Well, usually if it's not inside another loop it can't be performance critical ;) At least if there would be a setup cost for the vectorized variant. I don't think we need to worry about this case until a real testcase for this comes along. Richard. -- Richard Guenther <[EMAIL PROTECTED]> Novell / SUSE Labs
Re: 40% performance regression SPEC2006/leslie3d on gcc-4_2-branch
On 2/18/07, Daniel Berlin <[EMAIL PROTECTED]> wrote: On 2/17/07, H. J. Lu <[EMAIL PROTECTED]> wrote: > On Sat, Feb 17, 2007 at 01:35:28PM +0300, Vladimir Sysoev wrote: > > Hello, Daniel > > > > It looks like your changeset listed bellow makes performance > > regression ~40% on SPEC2006/leslie3d. I will try to create minimal > > test for this issue this week and update you in any case. > > > > That is a known issue: > > http://gcc.gnu.org/ml/gcc/2007-01/msg00408.html Yes, it is something we sadly cannot do anything about without doing a very large number of backports. There were some seriously broken things in 4.2's aliasing that got fixed properly in 4.3. The price of fixing them in 4.2 was a serious performance drop. There's the option of un-fixing them to get back to the state of 4.1 declaring them fixed in 4.3 earliest. Richard.
Re: GCC 4.2.0 Status Report (2007-02-19)
On 2/20/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: So, my feeling is that the best course of action is to set a relatively low threshold for GCC 4.2.0 and target 4.2.0 RC1 soon: say, March 10th. Then, we'll have a 4.2.0 release by (worst case, and allowing for lameness on my part) March 31. Feedback and alternative suggestions are welcome, of course. I'd vote for reverting the aliasing fixes on the 4.2 branch to get back the speed and the 4.1 bugs and release that. I expect that 4.2 will be made available in some form to openSUSE users because of gfortran and OpenMP improvements. Richard.
Re: GCC 4.2.0 Status Report (2007-02-19)
On 2/20/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: As for proposal to revert the aliasing fixes on the 4.2, IMHO aliasing bugs are pretty nasty it is hard to find a option to work around because alias info is used in many optimizations. All bugs we are talking about can be worked around by using -fno-strict-aliasing. Richard.
Re: GCC 4.2.0 Status Report (2007-02-19)
On 2/20/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: Richard Guenther wrote: > On 2/20/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: > >> As for proposal to revert the aliasing fixes on the 4.2, IMHO aliasing >> bugs are pretty nasty it is hard to find a option to work around because >> alias info is used in many optimizations. > > > All bugs we are talking about can be worked around by using > -fno-strict-aliasing. > It is too conservative and I think seriously descreases performance too. Although it would be interesting to see what degradation we would have with usage of the option. May be it is not so bad. In that case, your proposal would be reasonable because it is much less work. Well, in case you need to decide between wrong-code or decreasing performance I'd choose decreasing performance ;) I only wanted to make the point that reverting the patches in question will not introduce wrong-code bugs that cannot be worked around. If the working around is convenient or if the wrong-code bugs are easy to spot is another question, but in theory we have all these issues with 4.1 as well. So to clarify, I didn't suggest that we or the casual user should use -fno-strict-aliasing by default. Richard.
Re: 40% performance regression SPEC2006/leslie3d on gcc-4_2-branch
On 2/20/07, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote: Mark Mitchell wrote: >> FP performance regressions of the recent GCC 4.2 (revision 120817) >> compiler against September GCC 4.2 (revision 116799) > What does that translate to in terms of overall score? > Hi, This is 4.7% drop of SPECfp_base2006 ratio (geomean of individual FP ratios). Here is the full set of changes in cpu2k6/fp performance of GCC 4.2 compiler between r116799 and r120817, measured on Intel Core2 Duo at -O2 optimization level. Do you happen to have a 4.1.x baseline to compare those against? Richard. 410.bwaves -6.3% 416.gamess -0.7% 433.milc-7.0% 434.zeusmp 0.0% 435.gromacs -0.4% 436.cactusADM -1.1% 437.leslie3d-25.4% 444.namd0.0% 447.dealII -1.4% 450.soplex -3.9% 453.povray 0.0% 454.calculix-0.3% 459.GemsFDTD-18.3% 465.tonto -2.5% 470.lbm -4.1% 481.wrf -2.7% 482.sphinx3 -1.2% SPECfp_base2006 -4.7% - Grigory
Re: GCC 4.2.0 Status Report (2007-02-19)
On 2/21/07, Benjamin Kosnik <[EMAIL PROTECTED]> wrote: > 4.0 branched with critical flaws that were not noticed until 4.2.0 which > is why we end up with the missed optimiation regression in the first place. > > So the question is do we want to correct the regressions or not, because > right now we sound like we don't. Which regression is more important? > Wrong code or missed optimiation, I still say wrong code. I know other > people will disagree with me but guess what they can disagree with me all > they want, they will never win. This is the same thing with rejecting > legal code and compile time regressions (I remember a patch Apple complained > about because it caused a compile time regression but it fixed a reject legal, > even recently with some ICE and -g and a compile time regression though it > does not effect C++). Sure. This is one argument: we want correctness. Yipeee!!! There's no real arguing this point. It looks like 4.2.x and 4.1.x as it stands now can be made to be about the same when it comes to both correctness and (perhaps) codegen. However, you keep dropping the compile-time regression issue: is this on purpose? It's more than just missed optimizations here. If 4.2.0 is 10% slower than 4.1 and 4.3, then what? (In fact, I'm seeing 4.2.x as 22% slower for C++ vs. mainline, when running libstdc++ testsuite. Fixing this without merging Diego/Andrew patches for mem-ssa seem impossible. Vlad has also posted compile-time results that are quite poor.) GCC release procedures have historically been toothless when it comes to compile-time issues, which rank less than even SPEC scores. However, this is one of the things that GCC users evaluate in production settings, and the current evaluation of 4.2.x in this area is going to be pretty poor. (IMHO unusably poor.) It's not only compile-time, it's also memory-usage unfortunately. I think that another issue, which Vlad and Paolo pick up and I now echo, is setting a good break point for major new infrastructure. Historically, GCC development has prioritized time-based schedules (which then slip), instead of feature-based schedules (which might actually motivate the pricipals to do the heavy lifting required to get to release shape: see C++ABI in 3.0.x-3.2.x, 3.4.x, and fortran and SSA in 4.0, say java in 4.1.x). It seems to me that df (and LTO?) could be this for what-might-be 4.3.0. It's worth considering this, and experimenting with "release theory and practice." As Kaveh said, all compilers ship with a mix of bugs and features. There are some compelling new features in 4.2.0, and some issues. After evaluating all the above, I think re-branching may be the best bet for an overall higher-quality 4.2.x release series than 4.1.x. I believe re-branching will not make 4.2.x better but will only use up scarce resources. We have piled up so much new stuff for 4.3 already that it will be hard to stabilize it. So either go for 4.2.0 as it is now with all its regressions but possibly critically more correctness, or get back not-correctness of 4.1.x and fixing some of the regressions. Re-branching is the worst thing we can do I believe. Obviously, others disagree. Indeed ;) Richard.
Re: GCC 4.2.0 Status Report (2007-02-19)
On 2/21/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: To be honest, my instinct for the FSF is to take the 4% hit and get rid of this nasty class of bugs. Users measure compiler quality by more than just floating-point benchmarks; FP code is a relatively small (albeit important, and substantial) set of all code. That's why I asked Danny for the patches in the first place. Of course the speed of a compiler is measured on testcases where speed matters - and this is usually FP code. Now based on this reasoning we could (as CodeSourcery probably did) enable -fno-strict-aliasing by default, which fixes the class of 4.1.x bugs we are talking about. With leaving the possibility for the user to specify -fstrict-aliasing and get back the speed for FP code with the risk of getting wrong-code. Now, the realistic choices for 4.2.0 as I see them are, in order of my personal preference: 1) Ship 4.2.0 as is 2) Ship 4.2.0 with the aliasing fixes reverted 3) no more reasonable option. Maybe don't ship 4.2.0 at all. so, I don't see backporting more patches or even re-branching as a real option. Richard.
Re: "Error: suffix or operands invalid for `push'" on 64bit boxes
On 2/22/07, Alok kataria <[EMAIL PROTECTED]> wrote: Hi, I was trying some inline assembly with gcc. Running the assembly program on 64 bit system ( with 64bit gcc version) gives errors of the form /tmp/cc28kO9v.s: Assembler messages: /tmp/cc28kO9v.s:57: Error: suffix or operands invalid for `push' This is off topic here, please use [EMAIL PROTECTED] for this kind of questions. Your assembly statements are not valid 64bit asm but only work on 32bit. Richard.
Fold and integer types with sub-ranges
Currently for example in fold_sign_changed_comparison we produce integer constants that are not inside the range of its type values denoted by [TYPE_MIN_VALUE (t), TYPE_MAX_VALUE (t)]. For example consider a type with range [10, 20] and the comparison created by the Ada frontend: if ((signed char)t == -128) t being of that type [10, 20] with TYPE_PRECISION 8, like the constant -128. So fold_sign_changed_comparison comes along and decides to strip the conversion and convert the constant to type T which looks like unit size user align 8 symtab 0 alias set 4 canonical type 0x2b8156099c00 precision 8 min max RM size > readonly sizes-gimplified public unsigned QI size unit size user align 8 symtab 0 alias set 4 canonical type 0x2b8156099f00 precision 8 min max RM size constant invariant 5>> (note it's unsigned!) So the new constant gets produced using force_fit_type_double with the above (unsigned) type and the comparison now prints as if (t == 128) and the new constant 128 now being out of range of its type: constant invariant 128> (see the min/max values of that type above). What do we want to do about that? Do we want to do anything about it? If we don't want to do anything about it, why care about an exact TREE_TYPE of integer constants if the only thing that matters is signedness and type precision? Thanks for any hints, Richard.
Re: Fold and integer types with sub-ranges
On Fri, 23 Feb 2007, Duncan Sands wrote: > > Currently for example in fold_sign_changed_comparison we produce > > integer constants that are not inside the range of its type values > > denoted by [TYPE_MIN_VALUE (t), TYPE_MAX_VALUE (t)]. For example > > consider a type with range [10, 20] and the comparison created by > > the Ada frontend: > > > > if ((signed char)t == -128) > > > > t being of that type [10, 20] with TYPE_PRECISION 8, like the constant > > -128. So fold_sign_changed_comparison comes along and decides to strip > > the conversion and convert the constant to type T which looks like > ... > > What do we want to do about that? Do we want to do anything about it? > > If we don't want to do anything about it, why care about an exact > > TREE_TYPE of integer constants if the only thing that matters is > > signedness and type precision? > > I don't think gcc should be converting anything to a type like t's unless > it can prove that the thing it's converting is in the range of t's type. So > it presumably should try to prove: (1) that -128 is not in the range of > t's type; if it's not, then fold the comparison to false; otherwise (2) try > to prove that -128 is in the range of t's type; if so, convert it. Otherwise > do nothing. > > That said, this whole thing is a can of worms. Suppose the compiler wants to > calculate t+1. Of course you do something like this: > > int_const_binop (PLUS_EXPR, t, build_int_cst (TREE_TYPE (t), 1), 0); > > But if 1 is not in the type of t, you just created an invalid value! Eeek ;) True. Another reason to not force us creating 1s and 0s in such cases but allow simply int_const_binop_int (PLUS_EXPR, t, 1) of course while int_const_binop will truncate the result to its TYPE_PRECISION one still needs to make sure the result fits in t. That is, all the fancy TREE_OVERFLOW stuff only cares about TYPE_PRECISION and never about the types 'true' range via min/max value. > Personally I think the right thing to do is to eliminate these types > altogether somewhere early on, replacing them with their base types > (which don't have funky ranges), inserting appropriate ASSERT_EXPRs > instead. Probably types like t should never be seen outside the Ada > f-e at all. Heh - nice long-term goal ;) Might raise the usual wont-work-for-proper-debug-info complaint though. Richard.
Re: Fold and integer types with sub-ranges
On Fri, 23 Feb 2007, Richard Kenner wrote: > > That said, this whole thing is a can of worms. Suppose the compiler wants > > to > > calculate t+1. Of course you do something like this: > > > > int_const_binop (PLUS_EXPR, t, build_int_cst (TREE_TYPE (t), 1), 0); > > > > But if 1 is not in the type of t, you just created an invalid value! > > Yes, but why not do all arithmetic in the base type, just like Ada itself > does? Then you use 1 in that base type. Sure - I wonder if there is a reliable way of testing whether we face a non-base type in the middle-end. I suppose TREE_TYPE (type) != NULL won't work in all cases... (?) > > Personally I think the right thing to do is to eliminate these types > > altogether somewhere early on, replacing them with their base types > > (which don't have funky ranges), inserting appropriate ASSERT_EXPRs > > instead. Probably types like t should never be seen outside the Ada > > f-e at all. > > Not clear. There is the debug issue plus VRP can use the range information > for some things (we have to be very precise as to what). If we keep the model > of doing all arithmetic in the base type, there should be no problem. I agree. But appearantly fold does not care about base vs. non-base types and happily strips conversions between them, doing arithmetic in the non-base type. Now the question was whether we want to (try to) fix that for example by forcing gcc_assert (!TREE_TYPE (type)) on integer types in int_cst_binop and fold_binary and other places? Richard.
Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8
On 2/27/07, Menezes, Evandro <[EMAIL PROTECTED]> wrote: Honza, > Well, rather than unstable, they seems to be more memory layout > sensitive I would say. (the differences are more or less reproducible, > not completely random, but independent on the binary itself. I can't > think of much else than memory layout to cause it). I always wondered > if things like page coloring have chance to reduce this noise, but I > never actually got around trying it. You didn't mention the processors in your systems, but I wonder if they are dual-core. If so, perhaps it's got to do with the fact that each K8 core has its own L2, whereas C2 chips have a shared L2. Then, try preceding "runspec" with "taskset 0x02" to avoid the process from hopping between cores and finding cold caches (though the kernel strives to stick a process to a single core, it's not perfect). Well, both britten and haydn are single core, two processor systems. For SPEC2k6 runs the problem is that the 2gb ram of the machine are distributed over both numa nodes, so with the memory requirements of SPEC2k6 we always get inter-node memory traffic. Vangelis is a single processor, single core system (and the most stable one). Any idea on how to force to use local memory only for a process? Richard.
Re: Massive SPEC failures on trunk
On 03 Mar 2007 11:00:11 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: Grigory Zagorodnev <[EMAIL PROTECTED]> writes: > Trunk GCC shows massive (2 compile-time and 6 run-time) failures on > SPEC CPU2000 and CPU2006 at i386 and x86_64 on -O2 optimization > level. Regression introduced somewhere between revision 122487 and > 122478. > > There are three checkins, candidates for the root of regression: > http://gcc.gnu.org/viewcvs?view=rev&revision=122487 > http://gcc.gnu.org/viewcvs?view=rev&revision=122484 > http://gcc.gnu.org/viewcvs?view=rev&revision=122479 Can you boil one of these problems down into a test case which you can share? There are segfaults on Polyhedron rnflow, linpk and test_fpu as well. Richard.
Re: Hi all, just an onfo about huw to use gcc tree
On 3/3/07, Fabio Giovagnini <[EMAIL PROTECTED]> wrote: Thanks for the answer. I go deeper in my thought so that maybe you can give me more infos about how I have to investagte about gcc/gdb We are developers of embedded software on board designed and build by us; Generally we use 16 bit cisc and 32 bits risc cpu (Renesas h8/h8s; sh 2 / 3 ). We write applications for automotive enviroment mainly. We write the code using c/c++ and we compile using gcc. Because of we do not have much memory (ram and flash) we develop in the following way: each software we write has the same communication protocol running on RS232 or over TCP/IP and we have a simple monitoring progarm able to show the contents of the memory addresses in some format (char, int, long, signed unsigned, strings, ecc). Generally we declare a global variable, we write a debug value into it and during the run time we read at the right moment the content of such a variable. Good. For avoiding to read by hand .map file produced by ld, I developed a flex / bison simple analizer able to extract from .map file the address of a symbol. So into my tool I load the .map file and I write the name of the variable and I can read the content of it. This way of working becomes very hard if I use struct and union in c and classes in c++; I should know the offeset of each field of the struct so addind it to the base address known from the .map file, for each istance of such a struct I coud write "mysrtuct.myfield" into my tool, and, calculating the rigth address, my protocol could ask the target to read/write the content at that address. I prefer to avoid -g option because of my memory is never enough; but a good compromise cound be if I could compile the debug infos into sections I could remove after used by gdb for giving me the informations about the offset of each field. Is it possible? Just use the dwarf info produced by the compiler with -g and strip the binary. Of course I would debug embedded stuff using a simulator... Richard.
Re: Massive SPEC failures on trunk
On 3/3/07, Grigory Zagorodnev <[EMAIL PROTECTED]> wrote: Hi, Trunk GCC shows massive (2 compile-time and 6 run-time) failures on SPEC CPU2000 and CPU2006 at i386 and x86_64 on -O2 optimization level. Regression introduced somewhere between revision 122487 and 122478. There are three checkins, candidates for the root of regression: http://gcc.gnu.org/viewcvs?view=rev&revision=122487 http://gcc.gnu.org/viewcvs?view=rev&revision=122484 http://gcc.gnu.org/viewcvs?view=rev&revision=122479 Here is the list of failures: CPU2006/464.h264ref: GCC ICE image.c: In function 'UnifiedOneForthPix': image.c:1407: internal compiler error: in set_value_range, at tree-vrp.c:267 CPU2006/447.dealII: GCC ICE fe_tools.cc: In static member function 'static void FETools::get_projection_matrix(const FiniteElement&, const FiniteElement&, FullMatrix&) [with int dim = 3, number = double]': fe_tools.cc:322: error: definition in block 47 does not dominate use in block 49 for SSA_NAME: NMT.4199_484 in statement: NMT.4199_645 = PHI PHI argument NMT.4199_484 for PHI node NMT.4199_645 = PHI fe_tools.cc:322: internal compiler error: verify_ssa failed Most of the problems are fixed, dealII remains with: /gcc/spec/sb-balakirew-head-64-2006/x86_64/install-hack/bin/g++ -c -o quadrature.o -DSPEC_CPU -DNDEBUG -Iinclude -DBOOST_DISABLE_THREADS -Ddeal_II_dimension=3 -O2 -DSPEC_CPU_LP64 quadrature.cc quadrature.cc: In constructor 'Quadrature::Quadrature(const std::vector, std::allocator > >&)': quadrature.cc:64: error: 'atof' is not a member of 'std' I guess that's a fallout from Paolos include streamlining in libstdc++. Looks like dealII sources need a fix for that. bzip2 fails with: compress.c: In function 'BZ2_compressBlock': compress.c:711: internal compiler error: in cse_find_path, at cse.c:5930 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. specmake: *** [compress.o] Error 1 a known problem. (x86_64, -O3 -funroll-loops -fpeel-loops -ftree-vectorize) Richard.