Re: LTO and the inlining of functions only called once.
On Tue, Oct 13, 2009 at 8:31 PM, Toon Moene wrote: > Jeff Law wrote: > >> On 10/10/09 09:17, Daniel Jacobowitz wrote: > >>> On Sat, Oct 10, 2009 at 02:31:25PM +0200, Jan Hubicka wrote: >>> My solution would be probably to pass -fdump-ipa-inline parameter to lto compilation and read the log. It lists the inlining decisions and if something is not inlined, you get dump of reason why. > > OK, I did just that (of course, because I'm only interested in inlining > during Link-Time-Optimization, I only passed the compiler option to the link > phase of the the build). > > Now where does the resulting dump ends up - and how is it named ? > > I.e., in case of: > > gfortran -o exe -O3 -flto -fwhole-program -fdump-ipa-inline a.f lib.a > > ? It'll be in /tmp and named after the first object file, in your case it will be ccGGS24.o.047i.inline (because the first object file will be a tempfile). A minor inconvenience that maybe is going to be fixed. Richard. > > -- > Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > At home: http://moene.org/~toon/ > Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html >
Re: [4.4] Strange performance regression?
Quoting Mark Tall : Joern Rennecke wrote: But at any rate, the subject does not agree with the content of the original post. When we talk about a 'regression' in a particular gcc version, we generally mean that this version is in some way worse than a previous version of gcc. Didn't the original poster indicate that gcc 4.3 was faster than 4.4 ? In my book that is a regression. He also said that it was a different machine, Core 2 Q6600 vs some kind of Xeon Core 2 system with a total of eight cores. As different memory subsystems are likely to affect the code, it is not an established regression till he can reproduce a performance drop going from an older to a current compiler on the same or sufficiently similar machines, under comparable load conditions - which generally means that the machine must be idle apart from the benchmark.
讓大腦快速動起來的方法只要1600元 邀請您到 Plurk.com 註冊帳號
學習記憶法真的很便宜1600元不出門也可以學習 快來試試看喔。請按下面plurk(噗浪)的連結,就可以知道詳情了 --- 到底下的鏈結來瞧瞧 讓大腦快速動起來的方法只要1600元 的 Plurk 頁面: http://www.plurk.com/pbms1/invite/2 Plurk.com 是一個讓保持聯繫變得十分容易且有趣的社交日誌。 _ Opt Out of Plurk emails: This email was sent in connection with you Plurk.com membership. To stop receiving emails from Plurk, click this link: http://www.plurk.com/unsubscribe?bemail=Z2NjQGdjYy5nbnUub3Jn&key=73b5cd22162b4d5e398c0c3c20caee75 You can contact us at http://www.plurk.com/contact Plurk.com, 2425 Matheson Blvd 8th Floor, Suite 813 Mississauga, Ontario L4W 5K4 Canada
Re: LTO and the inlining of functions only called once.
Richard Guenther wrote: On Tue, Oct 13, 2009 at 8:31 PM, Toon Moene wrote: gfortran -o exe -O3 -flto -fwhole-program -fdump-ipa-inline a.f lib.a ? It'll be in /tmp and named after the first object file, in your case it will be ccGGS24.o.047i.inline (because the first object file will be a tempfile). A minor inconvenience that maybe is going to be fixed. Found it. That surely is counter-intuitive, though Thanks ! -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
Re: LTO and the inlining of functions only called once.
Richard Guenther wrote: It'll be in /tmp and named after the first object file, in your case it will be ccGGS24.o.047i.inline (because the first object file will be a tempfile). A minor inconvenience that maybe is going to be fixed. Now that Richard has pointed out to me where the info is, I can post it here. This are the inlining decision on my mini-example (just 5 subroutines and a "main"): gemini.f hlprog.f main.f phcall.f phtask.f sl2tim.f Reclaiming functions: Deciding on inlining. Starting with size 45477. Inlining always_inline functions: Deciding on smaller functions: Considering inline candidate phcall_.clone.3. Inlining failed: --param max-inline-insns-auto limit reached Considering inline candidate phtask_.clone.2. Inlining failed: --param max-inline-insns-auto limit reached Considering inline candidate gemini_.clone.1. Inlining failed: --param max-inline-insns-auto limit reached Considering inline candidate sl2tim_.clone.0. Inlining failed: --param max-inline-insns-auto limit reached Considering inline candidate hlprog. Inlining failed: --param max-inline-insns-auto limit reached Deciding on functions called once: Considering gemini_.clone.1 size 11443. Called once from hlprog 462 insns. Inlined into hlprog which now has 10728 size for a net change of -12620 size. Considering hlprog size 10728. Called once from main 7 insns. Inline limit reached, not inlined. Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size. The mistake made here is that *all* the above functions are "called once", but only GEMINI is considered for some reason (probably simply because it's the first one ?). Jan, if you're interested, I can send you the mini-example so that you can see for yourself. HLPROG calls GEMINI, which calls SL2TIM, which calls PHCALL, which calls PHTASK (all "only-once calls"). -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
Re: [4.4] Strange performance regression?
Joern Rennecke wrote: Quoting Mark Tall : Joern Rennecke wrote: But at any rate, the subject does not agree with the content of the original post. When we talk about a 'regression' in a particular gcc version, we generally mean that this version is in some way worse than a previous version of gcc. Didn't the original poster indicate that gcc 4.3 was faster than 4.4 ? In my book that is a regression. He also said that it was a different machine, Core 2 Q6600 vs some kind of Xeon Core 2 system with a total of eight cores. As different memory subsystems are likely to affect the code, it is not an established regression till he can reproduce a performance drop going from an older to a current compiler on the same or sufficiently similar machines, under comparable load conditions - which generally means that the machine must be idle apart from the benchmark. Ian's judgment in diverting to gcc-help was born out when it developed that -funroll-loops was wanted. This appeared to confirm his suggestion that it might have had to do with loop alignments. As long as everyone is editorializing, I'll venture say this case raises the suspicion that gcc might benefit from better default loop alignments, at least for that particular CPU. However, I've played a lot of games on Core i7 with varying unrolling etc. I find the behavior of current gcc entirely satisfactory, aside from the verbosity of the options required.
Re: i370 port - constructing compile script
> Huh. I've never seen this before. Is this with your patches to > generate a "single executable" or without? My patches are applied, but shouldn't be activated, because I haven't defined SINGLE_EXECUTABLE. I could try taking it back to raw 3.4.6 though and see if that has the same problem. Might be interesting ... Things are never that simple. :-) My target isn't in raw 3.4.6, so I had to use a different target (dignus), which worked! But dignus no longer worked with my changes. So had to get dignus working again before I could compare. I tried other shortcuts, but wasn't successful. After getting dignus working again I was able to start narrowing it down. For some reason gdb doesn't seem to be working as expected, so had to do without it. In the end, one line, long forgotten, in my target config file: #define pwait(a,b,c) (0) was what was responsible. :-) Most of my Posix replacement functions are in a separate unixio.h, which would normally be ignored in a configure/real unix environment. Not sure why this one ended up there. Anyway, after that interlude, I can finally move on to the original challenge! BFN. Paul.
Re: LTO and the inlining of functions only called once.
> Richard Guenther wrote: > > >It'll be in /tmp and named after the first object file, in your case it > >will > >be ccGGS24.o.047i.inline (because the first object file will be a > >tempfile). A minor inconvenience that maybe is going to be fixed. > > Now that Richard has pointed out to me where the info is, I can post it > here. This are the inlining decision on my mini-example (just 5 > subroutines and a "main"): > > gemini.f hlprog.f main.f phcall.f phtask.f sl2tim.f > > > Reclaiming functions: > Deciding on inlining. Starting with size 45477. > > Inlining always_inline functions: > > Deciding on smaller functions: > Considering inline candidate phcall_.clone.3. > Inlining failed: --param max-inline-insns-auto limit reached > Considering inline candidate phtask_.clone.2. > Inlining failed: --param max-inline-insns-auto limit reached > Considering inline candidate gemini_.clone.1. > Inlining failed: --param max-inline-insns-auto limit reached > Considering inline candidate sl2tim_.clone.0. > Inlining failed: --param max-inline-insns-auto limit reached > Considering inline candidate hlprog. > Inlining failed: --param max-inline-insns-auto limit reached > > Deciding on functions called once: > > Considering gemini_.clone.1 size 11443. > Called once from hlprog 462 insns. > Inlined into hlprog which now has 10728 size for a net change of > -12620 size. > > Considering hlprog size 10728. > Called once from main 7 insns. > Inline limit reached, not inlined. > > Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size. > > > The mistake made here is that *all* the above functions are "called > once", but only GEMINI is considered for some reason (probably simply > because it's the first one ?). > > Jan, if you're interested, I can send you the mini-example so that you > can see for yourself. Yes, I would be interested. It seems that for osme reason the other functions are not considered to be called once, perhaps a visibility issue. We also should say what limit was reached on inlining hlprog. Honza > > HLPROG calls GEMINI, which calls SL2TIM, which calls PHCALL, which calls > PHTASK (all "only-once calls"). > > -- > Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > At home: http://moene.org/~toon/ > Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
Re: loop optimization in gcc
> Hi, > > you also might want to take a look at the Graphite project. > http://gcc.gnu.org/wiki/Graphite where we do loop optimizations and > automatic parallelization based on the polytop model. If you need any > help feel free to ask. > > Tobias > > Hi, This seems to be quite interesting and challenging.Moreover,it is very close to what we are trying to achieve as well (on a small scale that is).I have started preliminary reading on the polytope model and the working of GRAPHITE. I will ask you if i face any doubts.It would be nice to contribute to this project. For the starters, can you tell me if GRAPHITE also does source to source transformations or otherwise to optimize??Coz i had read somewhere else that the polyhedral model used source to source transformations. -- cheers sandy
Re: loop optimization in gcc
On Wed, 2009-10-14 at 20:12 +0530, sandeep soni wrote: > > Hi, > > > > you also might want to take a look at the Graphite project. > > http://gcc.gnu.org/wiki/Graphite where we do loop optimizations and > > automatic parallelization based on the polytop model. If you need any > > help feel free to ask. > > > > Tobias > > > > > > Hi, > > This seems to be quite interesting and challenging.Moreover,it is very > close to what we are trying to achieve as well (on a small scale that > is).I have started preliminary reading on the polytope model and the > working of GRAPHITE. I will ask you if i face any doubts.It would be > nice to contribute to this project. > > > For the starters, can you tell me if GRAPHITE also does source to > source transformations or otherwise to optimize??Coz i had read > somewhere else that the polyhedral model used source to source > transformations. Hi, you are right. There are several polytope frameworks that work on the source code level (LooPo, Cloog/Clan from Cedric Bastoul), however Graphite works on the intermediate level tree-ssa in gcc. Therefore we can not do any source to source transformations. The idea is to not be limited to specific input languages or special formatting of the code, but to be able to use the powerful analysis in the gcc middle end. This allows us to work on any input language and to detect loops that do not even look like a loop in the input program (goto-loops). Using the powerful scalar evolution framework in gcc Graphite also handles loops that do not look like affine linear loops. This is a powerful approach in its earlier stages. Basic loops and simple code transformations already work, but there is still a lot left to be done. Tobi
Re: LTO and the inlining of functions only called once.
On Wed, Oct 14, 2009 at 04:33:35PM +0200, Jan Hubicka wrote: > > Deciding on smaller functions: > > Considering inline candidate phcall_.clone.3. > > Inlining failed: --param max-inline-insns-auto limit reached > Yes, I would be interested. It seems that for osme reason the other > functions are not considered to be called once, perhaps a visibility > issue. We also should say what limit was reached on inlining hlprog. Maybe because of whatever did that cloning? -- Daniel Jacobowitz CodeSourcery
Re: LTO and the inlining of functions only called once.
Jan Hubicka wrote: Yes, I would be interested. It seems that for osme reason the other functions are not considered to be called once, perhaps a visibility issue. We also should say what limit was reached on inlining hlprog. Sent off bzip2'd tar file. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
Re: [4.4] Strange performance regression?
Hi Joern and list(s), On Wed, Oct 14, 2009 at 12:05 PM, Joern Rennecke wrote: > He also said that it was a different machine, Core 2 Q6600 vs > some kind of Xeon Core 2 system with a total of eight cores. > As different memory subsystems are likely to affect the code, it > is not an established regression till he can reproduce a performance > drop going from an older to a current compiler on the same or > sufficiently similar machines, under comparable load conditions - > which generally means that the machine must be idle apart from the > benchmark. > I decided to bite the bullet and went back to GCC 4.3.4 on the same very machine where I'm experiencing the issue. With these flags: -O2 -march=core2 -fomit-frame-pointer the performance is the same as on 4.4.1 *with* -funroll-loops (actually, around 5% better, but probably it is not statistically significant). So, with 4.3.4, I get the expected *good* performance. Just to give an order of magnitude, the "good" performance measure is ~5.1-5.2 seconds, while the "bad" performance is ~11-12 seconds for this test. I ran the tests both on an idle setup (no X, just couple of services in the background) and with a "busy" machine (Firefox, audio playing in the background,...) but I could hardly notice any difference in all cases. I can try to investigate further if anyone is interested. Thanks again to everybody, Francesco PS: hope I'm not infringing the netiquette by cross-posting to two mailing lists.
Re: loop optimization in gcc
On Wed, Oct 14, 2009 at 9:02 PM, Tobias Grosser wrote: > On Wed, 2009-10-14 at 20:12 +0530, sandeep soni wrote: >> > Hi, >> > >> > you also might want to take a look at the Graphite project. >> > http://gcc.gnu.org/wiki/Graphite where we do loop optimizations and >> > automatic parallelization based on the polytop model. If you need any >> > help feel free to ask. >> > >> > Tobias >> > >> > >> >> Hi, >> >> This seems to be quite interesting and challenging.Moreover,it is very >> close to what we are trying to achieve as well (on a small scale that >> is).I have started preliminary reading on the polytope model and the >> working of GRAPHITE. I will ask you if i face any doubts.It would be >> nice to contribute to this project. >> >> >> For the starters, can you tell me if GRAPHITE also does source to >> source transformations or otherwise to optimize??Coz i had read >> somewhere else that the polyhedral model used source to source >> transformations. > > Hi, > > you are right. There are several polytope frameworks that work on the > source code level (LooPo, Cloog/Clan from Cedric Bastoul), however > Graphite works on the intermediate level tree-ssa in gcc. Therefore we > can not do any source to source transformations. > The idea is to not be limited to specific input languages or special > formatting of the code, but to be able to use the powerful analysis in > the gcc middle end. > This allows us to work on any input language and to detect loops that do > not even look like a loop in the input program (goto-loops). Using the > powerful scalar evolution framework in gcc Graphite also handles loops > that do not look like affine linear loops. > This is a powerful approach in its earlier stages. Basic loops and > simple code transformations already work, but there is still a lot left > to be done. > > Tobi > > Hi, Sounds absolutely convincing to me. I am too keen to contribute in this in any way possible.I will first try to understand how it works totally .Would you mind me pressing on with some of issues in the near future? I am afraid though that they might be a bit more theoretical to begin with. -- cheers sandy
Re: Turning off unrolling to certain loops
Ok, I've actually gone a different route. Instead of waiting for the middle end to perform this, I've directly modified the parser stage to unroll the loop directly there. Basically, I take the parser of the for and modify how it adds the various statements. Telling it to, instead of doing in the c_finish_loop : if (body) add_stmt (body); if (clab) add_stmt (build1 (LABEL_EXPR, void_type_node, clab)); if (incr) add_stmt (incr); ... I tell it to add multiple copies of body and incr and the at the end add in the loop the rest of it. I've also added support to remove further unrolling to these modified loops and will be handling the "No-unroll" pragma. I then let the rest of the optimization passes, fuse the incrementations together if possible, etc. The initial results are quite good and seem to work and produce good code. Currently, there are two possibilities : - If the loop is not in the form we want, for example: for (;i wrote: > Hi, > >> such an epilogue is needed when the # of iterations is not known in the >> compile time; it should be fairly easy to modify the unrolling not to >> emit it when it is not necessary, > > Agreed, that is why I was surprised to see this in my simple example. > It seems to me that the whole unrolling process has been made to, on > purpose, have this epilogue in place. > > In the case where the unrolling would be perfect (ie. there would be > no epilogue), the calculation of the max bound of the unrolled version > is always done to have this epilogue (if you have 4 iterations and ask > to unroll twice, it will actually change the max bound to 3, > therefore, having one iteration of the unrolled version and 2 > iterations of the original...). I am currently looking at the code of > tree_transform_and_unroll_loop to figure out how to change this and > not have an epilogue in my cases. > > Jc >
Re: LTO and the inlining of functions only called once.
-Winline doesn't help here. Scanning the assember output does (obviously!). nm also does. Paolo
Re: LTO and the inlining of functions only called once.
We should also keep in mind that such logs aimed at users should support i18n - unlike the existing dumps for compiler developers, which are quite properly English only, and most calls to internal_error which should only appear if there is a compiler bug and are also only meant to be useful for compiler developers (so represent useless work for translators at present - though it does seem possible some internal_error calls could actually appear with invalid input rather than compiler bugs and so should be normal errors). We should first support i18n of C++ error messages, which is totally broken for languages that have more than one case or more than one article form (e.g. singular and plural of "the"). Paolo
Why does template constuctor only work at global scope? Bug?
Can anyone tell me why this template constructor only works at global scope? Is this a g++ bug (I'm using g++ 4.5)? // g++ -std=c++0x t.cpp class T { public: template T(int (&a)[N]) {} }; T t1 = (int[]) {1, 2, 3}; // OK int main(int argc, char **argv) { T t2 = (int[]) {1, 2, 3}; // error: conversion from 'int [3]' to non-scalar type 'T' requested } I realize I could use a std::initializer_list constructor instead, but I'm curious why what I'm trying doesn't work. Thanks, Ben
checking for debug a plugin on a production compiler?
Hello All Is the following scenario possible: gcc-4.5 is being released at Christmas 2009 [*]. Since it is a production compiler, it has been compiled without any checking (no ENABLE_CHECKING, etc...) a plugin foo.c is compiled as foo.so for that gcc-4.5, but since the plugin is probably buggy, it is compiled with ENABLE_CHECKING. This should probably work if ENABLE_CHECKING does not add extra fields in data structure, but only runs checks without "side-effects" - other than consume CPU resources (& memory)... In other words, if ENABLE_CHECKING only enables gcc_assert and similar stuff (that is, only add code but not "data" inside GCC, if you guess what I am thinking of). If this is impossible (for instance because ENABLE_CHECKING modifies the data structure inside GCC), we might consider documenting that. Regards. Note [*]: I don't pretend knowing when 4.5 will be released, and its release date is not the subject of that discussion! -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: Why does template constuctor only work at global scope? Bug?
2009/10/14 Ben Bridgwater: > Can anyone tell me why this template constructor only works at global scope? http://gcc.gnu.org/bugs/ or the gcc-help mailing list. This mailing list is for discussing development of GCC, not help using it. > I realize I could use a std::initializer_list constructor > instead, but I'm curious why what I'm trying doesn't work. (int[]) is an incomplete type. Unless I've missed a change in C++0x which makes it valid, it shouldn't work in either place you use it. If I have missed a change in C++0x, it probably doesn't work because it's not implemented in GCC yet.
Re: Turning off unrolling to certain loops
Hi, > Ok, I've actually gone a different route. Instead of waiting for the > middle end to perform this, I've directly modified the parser stage to > unroll the loop directly there. I think this is a very bad idea. First of all, getting the information needed to decide at this stage whether unrolling is possible at all is difficult; for instance, what happens for a loop of form for (...) { something; label: something else; } ... goto label; ? Are you sure that you handle correctly loops with several exits from the loop body, loops whose control variables may overflow, exception handling, ...? And if so, what is the benefit of having the code to deal with all these complexities twice in the compiler? Furthermore, unrolling the loops this early may increase compile time with little or no gain in code quality. Zdenek
Re: loop optimization in gcc
On Wed, 2009-10-14 at 23:56 +0530, sandeep soni wrote: > On Wed, Oct 14, 2009 at 9:02 PM, Tobias Grosser > wrote: > > On Wed, 2009-10-14 at 20:12 +0530, sandeep soni wrote: > >> > Hi, > >> > > >> > you also might want to take a look at the Graphite project. > >> > http://gcc.gnu.org/wiki/Graphite where we do loop optimizations and > >> > automatic parallelization based on the polytop model. If you need any > >> > help feel free to ask. > >> > > >> > Tobias > >> > > >> > > >> > >> Hi, > >> > >> This seems to be quite interesting and challenging.Moreover,it is very > >> close to what we are trying to achieve as well (on a small scale that > >> is).I have started preliminary reading on the polytope model and the > >> working of GRAPHITE. I will ask you if i face any doubts.It would be > >> nice to contribute to this project. > >> > >> > >> For the starters, can you tell me if GRAPHITE also does source to > >> source transformations or otherwise to optimize??Coz i had read > >> somewhere else that the polyhedral model used source to source > >> transformations. > > > > Hi, > > > > you are right. There are several polytope frameworks that work on the > > source code level (LooPo, Cloog/Clan from Cedric Bastoul), however > > Graphite works on the intermediate level tree-ssa in gcc. Therefore we > > can not do any source to source transformations. > > The idea is to not be limited to specific input languages or special > > formatting of the code, but to be able to use the powerful analysis in > > the gcc middle end. > > This allows us to work on any input language and to detect loops that do > > not even look like a loop in the input program (goto-loops). Using the > > powerful scalar evolution framework in gcc Graphite also handles loops > > that do not look like affine linear loops. > > This is a powerful approach in its earlier stages. Basic loops and > > simple code transformations already work, but there is still a lot left > > to be done. > > > > Tobi > > > > > > Hi, > > Sounds absolutely convincing to me. I am too keen to contribute in > this in any way possible.I will first try to understand how it works > totally .Would you mind me pressing on with some of issues in the near > future? I am afraid though that they might be a bit more theoretical > to begin with. Sure. Just drop me a line
Re: delete dead feature branches?
On Wed, 2009-10-14 at 08:33 +0200, Michael Matz wrote: > So, why not just move them to dead-branches now, and be done with it? OK, your argument has convinced me. :-) Cheers, Ben