which opt. flags go where? - references
Hello, I'm planning to do some research on the optimization flags available for GCC (currently, using 4.1.1). More in particular, we want to see how we can come up with a set of combinations of flags which allow a tradeoff between compilation time, execution time and code size (as with -O1, -O2, -O3, -Os). Off course, we don't want to do an exhaustive search of all possible combinations of flags, because that would be totally unfeasible (using the 56 flags enabled in -O3 for gcc 4.1.1 yields ~72*10^15 (= 2^56-1) possible candidates). It seems there has already been some work done on this subject, or atleast that's what richi on #gcc (OFTC) told me. He wasn't able to refer me to work in that area though. I have found some references myself (partially listed below), but I'm hoping people more familiar with the GCC community can help expand this list. [1] Almagor et al., Finding effective compilation sequences (LCES'04) [2] Cooper et al., Optimizing for Reduced Code Space using Genetic Algorithms (LCTES'99) [3] Almagor et al., Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms (Tech.Report) [3] Acovea: Using Natural Selection to Investigate Software Complexities (http://www.coyotegulch.com/products/acovea/) Some other questions: * I'm planning to do this work on an x86 platform (i.e. Pentium4), but richi told me that's probably not a good idea, because of the low number of registers available on x86. Comments? * Since we have done quite some analysis on the SPEC2k benchmarks, we'll also be using them for this work. Other suggestions are highly appreciated. * Since there has been some previous work on this, I wonder why none of it has made it into GCC development. Were the methods proposed unfeasible for some reason? What would be needed to make an approach to automatically find suitable flags for -Ox interesting enough to incorporate it into GCC? Any references to this previous work? greetings, Kenneth Hoste Paris, ELIS, Ghent University (Belgium) -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: which opt. flags go where? - references
Hi, On 07 Feb 2007, at 15:22, Diego Novillo wrote: Kenneth Hoste wrote on 02/07/07 08:56: [1] Almagor et al., Finding effective compilation sequences (LCES'04) [2] Cooper et al., Optimizing for Reduced Code Space using Genetic Algorithms (LCTES'99) [3] Almagor et al., Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms (Tech.Report) [3] Acovea: Using Natural Selection to Investigate Software Complexities (http://www.coyotegulch.com/products/acovea/) You should also contact Ben Elliston (CC'd) and Grigori Fursin (sorry, no email). Ben worked on dynamic reordering of passes, his thesis will have more information about it. Grigori is working on an API for iterative an adaptive optimization, implemented in GCC. He presented at the last HiPEAC 2007 GCC workshop. Their presentation should be available at http://www.hipeac.net/node/746 I actually talked to Grigori about the -Ox flags, I was at the HiPEAC conference too ;-) I didn't include references to his work, because my aim wouldn't be at reordering of passes, but just selecting them. I understand that reordering is of great importance while optimizing, but I think this project is big enough as is. Some other questions: * I'm planning to do this work on an x86 platform (i.e. Pentium4), but richi told me that's probably not a good idea, because of the low number of registers available on x86. Comments? When deriving ideal flag combinations for -Ox, we will probably want common sets for the more popular architectures, so I would definitely include x86. OK. I think richi's comment on x86 was the fact that evaluating the technique we are thinking about might produce results which are hard to 'port' to a different architecture. But then again, we won't be stating we have found _the_ best set of flags for a given goal... Thank you for your comment. * Since we have done quite some analysis on the SPEC2k benchmarks, we'll also be using them for this work. Other suggestions are highly appreciated. We have a collection of tests from several user communities that we use as performance benchmarks (DLV, TRAMP3D, MICO). There should be links to the testers somewhere in http://gcc.gnu.org/ OK, sounds interesting, I'll look into it. In which way are these benchmarks used? Just to test the general performance of GCC? Have they been compared to say, SPEC CPU, or other 'research/industrial' benchmark suites (such as MiBench, MediaBench, EEMBC, ...) ? * Since there has been some previous work on this, I wonder why none of it has made it into GCC development. Were the methods proposed unfeasible for some reason? What would be needed to make an approach to automatically find suitable flags for -Ox interesting enough to incorporate it into GCC? Any references to this previous work? It's one of the things I would like to see implemented in GCC in the near future. I've been chatting with Ben and Grigori about their work and it would be a great idea if we could discuss this at the next GCC Summit. I'm hoping someone will propose a BoF about it. I'm hoping my ideas will lead to significant results, because I think this is an important issue. greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: which opt. flags go where? - references
On 08 Feb 2007, at 10:47, Ronny Peine wrote: Hi, maybe http://docs.lib.purdue.edu/ecetr/123/ would also be interesting for you. There, a quadratic algorithm for finding a nearly optimal set of compiler flags is described. The results are quite promising and i have also tested it on my own benchmarkingsuite with good results. Thank you very much. After reading the abstract, I'm highly interested in this work, because they also use GCC and SPEC CPU2000, as I'm planning to do... Which benchmarks did you test on? greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
GCC -On optimization passes: flag and doc issues
Hiya, I'm doing research on which optimization passes to enable in the various -On flags, and I've stumbled onto a/some minor bug(s) and problems with the GCC documentation for the 4.1.2 version: * When using -falign-loops or -fno-align-loops the corresponding internal variable 'align-loops' should be set to 0 (= use default setting) or 1 (= no aligning) resp. When parsing the various flags, a variable 'value' is used to set (value=1) or unset (value=0) the corresponding flag. Unfortunately, these values should be reversed for the -falign-X flags, which isn't done, and thus using -falign- loops causes no aligning, while -fno-align-loops causes aligning with the default setting for the arch you're compiling for. Same problem for -falign-jumps, -falign-labels and -falign-functions. * On x86, -fschedule-insns is disabled, but -fschedule-insns2 (or the corresponding internal flag flag_schedule_insns_after_reload) is still being used... The reason for disabling fschedule-insns is increased register pressure (and x86 has few registers already). I'm not sure if this a bug, but why disable -fschedule-insns and leave - fschedule-insns2 active? Because , according to the docs, fschedule- insns2 = fschedule-insns + an additional pass. Any comments on this are welcome... I've file a bug report for the first issue (http://gcc.gnu.org/ bugzilla/show_bug.cgi?id=31586), but since this is easy to fix, I'd like to fix it myself. The only thing holding me back is that I'm new at this: I've never contributed to an open source project before, and I'm only vaguely familiar with CVS and the likes. Is there a newbie- guide to contributing to GCC? How can I make sure this fix is done for 4.1, 4.2 _and_ 4.3? Also, while trying to identify the flags enabled in the various -On flags, I've run into incomplete or even wrong documentation regarding the optimization passes. Some examples: - the -fipa-X flags are not mentioned in the 4.1.2 documentation - funit-at-a-time is still being reported as enabled in -O2 and above, while it is already active in -O1 (this seems to fixed in the 4.3.0 docs though) - the documentation for fmove-loop-invariants and ftree-loop-optimize mentions they are enabled at -O1 when they are not - finline-functions is enabled at -Os, but isn't listed so I can imagine few people care about the exact flags enabled at -On, but why list wrong information which will only confuse people (as it did for me). Last but not least, I have some questions: - Some flags are enabled by other flags (for example, fschedule-insns enables fsched-interblock and fsched-spec according to the docs). I was unable to find where in the source code this is done... And because I want to incorporate as much active On-flags as possible, I'd like to track down for which flags this is really true, and for which the docs are out-of-date. - When new optimization passes are completed, how is decided where they go to (-O1, -O2, -O3, -Os, none)? For example, why did funit-at- a-time switch from -O2 to -O1 a while ago? And, as I noticed from the 4.3.0 docs, why is fsplit-wide-types enabled at -O1, and not -O2? Is there some testing done, or is it just people saying "I think X belongs there"? greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC -On optimization passes: flag and doc issues
On 17 Apr 2007, at 18:18, Eric Weddington wrote: -Original Message- From: Kenneth Hoste [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 17, 2007 3:23 AM To: gcc@gcc.gnu.org Subject: GCC -On optimization passes: flag and doc issues - finline-functions is enabled at -Os, but isn't listed so And it seems to have some issues: <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31528> Comments #4 and #6. This is exactly one of the reasons for our research: providing a usefull way to determine which flags should go where... Of course, this still depends on the benchmarks used and the platform chosen, but still, the way it is done now it's just guessing IMHO (of course, using domain knowledge, but still)... greetings, K. -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: Segfault on OpenMP program
On 18 Apr 2007, at 15:38, Paul Brook wrote: On Wednesday 18 April 2007 00:19, FX Coudert wrote: Someone reported on bug on a trivial statically-linked Fortran progam with OpenMP and a I/O operation. I can reproduce the segfault, which happens at: ... Andrew suggested on bugzilla that this might be a GLIBC issue (I use glibc-2.4 from redhat), with "pthreads not linking in correctly". Does this ring a bell? Can someone confirm that it's not a GCC issue (or that it is)? Details can be found in PR 31604. The answer to any question involving glibc and static linking is generally "don't do that". I've seen pthread related problems with static linking, though that was related to nss dlopening a shared library from a static application. Is this only advised with parallel programs, or is it a general rule? In my research, I statically link all my benchmarks because that I measure stuff about them using instrumentation on a bunch of computer (which might have various OSs, OS versions, libc versions, ...). Because we want to analyze the same dynamic executed code over and over again (no matter on which machine the analysis is donne), I use static linking. I've heard to warnings about static linking before from other people too: using a libc version which is not intented for the system you are running on might lead to weird behavior. Concerning this, I have some questions: - What are the possible issues with static linking? Are they listed above (pthread problems, libc vs kernel conflicts), or are there more? - How can I easily dynamically link against libc, but statically link with everything else? Currently, making everything statically linked is easy: in the various makefiles I set CC="gcc -static". Is there a similar 'trick' to link libc dynamically but everything else statically? (I'm using GCC 4.1.2 if it matters) greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC mini-summit - benchmarks
On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote: 11) H.J. Lu discussed SPEC CPU 2006. He reported that a couple of the tests do not run successfully, and it appears to be due to bugs in the tests which cause gcc to compile them in unexpected ways. He has been reporting the problems to the SPEC committee, but is being ignored. He encouraged other people to try it themselves and make their own reports. I'm not sure what 'tests' mean here... Are test cases being extracted from the SPEC CPU2006 sources? Or are you refering to the validity tests of the SPEC framework itself (to check whether the output generated by some binary conforms with their reference output)? 12) Jose Dana reported his results comparing different versions of gcc and icc at CERN. They have a lot of performance sensitive C++ code, and have gathered a large number of standalone snippets which they use for performance comparisons. Some of these can directly become bugzilla enhancement requests. In general this could ideally be turned into a free performance testsuite. All the code is under free software licenses. Is this performance testsuite available somewhere? Sounds interesting to add to my (long) list of benchmark suites. greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC mini-summit - compiling for a particular architecture
On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote: 13) Michael Meissner raised the idea of compiling functions differently for different processors, choosing the version based on a runtime decision. This led to some discussion of how this could be done effectively. In particular if there is an architectural difference, such as Altivec, you may get prologue instructions which save and restore registers which are not present on all architectures. Related to this: have you guys ever considered to making the -On flags dependent on the architecture? Now, the decision which flags are activated for the -On flags is independent of the architecture I believe (except for flags which need to be disabled to ensure correct code generation, such as - fschedule-insns for x86). I must say I haven't looked into this in great detail, but atleast for the passes controlled by flags on x86, this seems to be the case. I think choosing the flags in function of the architecture you are compiling for, might be highly beneficial (see http://gcc.gnu.org/ bugzilla/show_bug.cgi?id=31528 for example). greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC -On optimization passes: flag and doc issues
On 17 Apr 2007, at 16:27, Ian Lance Taylor wrote: Kenneth Hoste <[EMAIL PROTECTED]> writes: * When using -falign-loops or -fno-align-loops the corresponding internal variable 'align-loops' should be set to 0 (= use default setting) or 1 (= no aligning) resp. When parsing the various flags, a variable 'value' is used to set (value=1) or unset (value=0) the corresponding flag. Unfortunately, these values should be reversed for the -falign-X flags, which isn't done, and thus using -falign- loops causes no aligning, while -fno-align-loops causes aligning with the default setting for the arch you're compiling for. Same problem for -falign-jumps, -falign-labels and -falign-functions. Ouch. Thanks for noticing. I hope I can fix this my self (should be fairly easy), once I get familiar with submitting patches. A related question: how is decided which priority a bug gets? Also, while trying to identify the flags enabled in the various -On flags, I've run into incomplete or even wrong documentation regarding the optimization passes. Some examples: - the -fipa-X flags are not mentioned in the 4.1.2 documentation Looks like a bug. A documentation bug you mean? - funit-at-a-time is still being reported as enabled in -O2 and above, while it is already active in -O1 (this seems to fixed in the 4.3.0 docs though) This would be a bug but it looks OK to me in 4.1. Yes, you're right, sorry. It seems this was fixed in the 4.1.2 release. - the documentation for fmove-loop-invariants and ftree-loop-optimize mentions they are enabled at -O1 when they are not Actually I think they are enabled at -O1. This is done in a slightly tricky way: they default to being on, but they have no effect unless the other loop optimization passes are run. I think it would be appropriate to clean this up to make the code more obvious. Hmm, okay, thank you. Do you know of any other flags being activated like this in a 'hidden' way? - finline-functions is enabled at -Os, but isn't listed so Yes, this should be documented better. The general -finline-functions is not enabled at -Os. Only inlining of small functions is enabled. This is done by setting parameters. Yes, I've noticed the setting of the parameters. But this should definitely be reflected in the documentation. - Some flags are enabled by other flags (for example, fschedule-insns enables fsched-interblock and fsched-spec according to the docs). I was unable to find where in the source code this is done... And because I want to incorporate as much active On-flags as possible, I'd like to track down for which flags this is really true, and for which the docs are out-of-date. -fsched-interblock and -fsched-spec default to 1 in common.opt. But they have no effect unless you enable a scheduling pass. Another example of 'hidden' flag-activation. Any more? - When new optimization passes are completed, how is decided where they go to (-O1, -O2, -O3, -Os, none)? For example, why did funit-at- a-time switch from -O2 to -O1 a while ago? And, as I noticed from the 4.3.0 docs, why is fsplit-wide-types enabled at -O1, and not -O2? Is there some testing done, or is it just people saying "I think X belongs there"? Where to put optimizations is a trade-off between the amount of time they take and the amount of good they do. The testing which is done is compilation time testing. Hmm, okay. I still feel this is a bit ad-hoc. What about flags that don't get activated in any of the -On flags? Are these not stable enough yet on important architectures, or are there other reasons? greetings, K. -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: GCC mini-summit - compiling for a particular architecture
Citeren "Kaveh R. GHAZI" <[EMAIL PROTECTED]>: On Mon, 23 Apr 2007, Mark Mitchell wrote: I'm certainly not trying to suggest that we run SPEC on every architecture, and then make -O2 be the set of optimization options that happens to do best there, however bizarre. Why not? Is your objection because SPEC doesn't reflect real-world apps or because the option set might be "bizarre"? The bizarreness of the resulting flag set should not be a consideration IMHO. Humans are bad at predicting how complex systems like this behave. The fact that the "best" flags may be non-intuitive is not surprising. I find the case of picking compiler options for optimizing code very much like choosing which part of your code to hand optimize programmatically. People often guess wrongly where their code spends its time, that's why we have profilers. So I feel we should choose our per-target flags using a tool like Acovea to find what the "best" options are on a per-target basis. Then we could insert those flags into -O2 or "-Om" in each backend. Whether we use SPEC or the testcases included in Acovea or maybe GCC itself as the app to tune for could be argued. And some care would be necessary to ensure that the resulting flags don't completely hose other large classes of apps. But IMHO once you decide to do per-target flags, something like this seems like the natural conclusion. http://www.coyotegulch.com/products/acovea/ I totally agree with you, Acovea would a good tool for helping with this. But, in my opinion, it has one big downside: it doesn't allow a tradeoff between for example compilation time and execution time. My work tries to tackle this: I'm using an evolutionary approach (like Acovea) which is multi-objective (unlike Acovea), meaning it optimizes a Pareto curve for comp. time, execution time and code size. I'm also trying to speed up things over the way Acovea handles things, which I won't describe any further for now. I'm currently only using the SPEC CPU2000 benchmarks (and I believe Acovea does too), but this shouldn't be a drawback: the methodology is completely independent of the benchmarks used. I'm also trying to make sure everything is parameterizable (size of population, number of generations, crossover/mutation/migration rates, ...). Unfortunately, I'm having quite a lot of problems with weird combinations of flags (which generate bugs, as was mentioned in this thread), which is slowing things down. Instead of trying to fix these bugs, I'm just ignoring combinations of flags which generate them for now (although I'l try to report each bug I run into in Bugzilla). Hopefully, this work will produce nice results by June, and I'll make sure to report on it on this mailinglist once it's done. greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - compiling for a particular architecture
Citeren Diego Novillo <[EMAIL PROTECTED]>: Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between "exactly the same passes on all targets" and "use Acovea for every CPU to pick what -O2 means". Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler. Sorry to be blunt, but that's indeed quite naive :-) Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations. If you also kind the passes not controlled by a flag, but decided upon depending on the optimization level, that adds another, virtual flag (i.e. using -O1, -O2, -O3 or -Os as base setting). A serious set of programs can easily take tens of minutes, so I think you can easily see trying all possible combinations is totally unfeasible. The nice thing is you don't have to: you can learn from the combinations you've evaluated, you can start with only a subset of programs and add others gradually, ... My work is actually concentrating on building a framework to do exactly that: give a set of recipes for -On flags which allow a choice, and which are determined by trading off compilation time, execution time and code size. I won't be at the GCC summit in Canada (I'm in San Diego then presenting some other work), but I'll make sure to announce our work when it's finished... greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
RE: GCC mini-summit - compiling for a particular architecture
Citeren Dave Korn <[EMAIL PROTECTED]>: On 23 April 2007 19:07, Diego Novillo wrote: Mark Mitchell wrote on 04/23/07 13:56: So, I think there's a middle ground between "exactly the same passes on all targets" and "use Acovea for every CPU to pick what -O2 means". Using Acovea to reveal some of the suprising, but beneficial results, seems like a fine idea, though. I'm hoping to hear something along those lines at the next GCC Summit. I have heard of a bunch of work in academia doing extensive optimization space searches looking for combinations of pass sequencing and repetition to achieve optimal results. My naive idea is for someone to test all these different combinations and give us a set of -Ox recipes that we can use by default in the compiler. Has any of the Acovea research demonstrated whether there actually is any such thing as a "good default set of flags in all cases"? If the results obtained diverge significantly according to the nature/coding style/architecture/other uncontrolled variable factors of the application, we may be labouring under a false premise wrt. the entire idea, mightn't we? I don't think that has been shown. Acovea evaluation has only been done on a few architectures, and I don't believe a comparison was made. But, you guys are probably the best team to try that out: you have access to a wide range of platforms, and the experience needed to solve problems when they present them. Hopefully, my upcoming framework will boost such an effort. Remember: adjusting the way in which GCC handles -On flags is only needed if the tests suggest it will be usefull. greetings, Kenneth This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - compiling for a particular architecture
Citeren Diego Novillo <[EMAIL PROTECTED]>: Dave Korn wrote on 04/23/07 14:26: Has any of the Acovea research demonstrated whether there actually is any such thing as a "good default set of flags in all cases"? If the results Not Acovea itself. The research I'm talking about involves a compiler whose pipeline can be modified and resequenced. It's not just a matter of adding -f/-m flags. The research I've seen does a fairly good job at modelling AI systems that traverse the immense search space looking for different sequences. Any references? -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
Re: GCC mini-summit - compiling for a particular architecture
On 23 Apr 2007, at 20:43, Diego Novillo wrote: [EMAIL PROTECTED] wrote on 04/23/07 14:37: Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations. If you also kind the passes not controlled by a flag, but decided upon depending on the optimization level, that adds another, virtual flag (i.e. using -O1, -O2, -O3 or -Os as base setting). No, that's not what I want. I want a static recipe. I do *not* want -Ox to do this search every time. I'm not saying you need to do this every time you install GCC. I'm saying trying out 2^60 combinations (even offline) is totally unfeasible. It goes like this: Somebody does a study over a set of applications that represent certain usage patterns (say, FP and INT just to mention the two more common classes of apps). The slow search is done offline and after a few months, we get the results in the form of a table that says for each class and for each -Ox what set of passes to execute and in what order they should be executed. Not to say that the current sequencing and repetition are worthless, but I think they could be improved in a quasi systematic way using this process (which is slow and painful, I know). Exactly my idea. My work is actually concentrating on building a framework to do exactly that: give a set of recipes for -On flags which allow a choice, and which are determined by trading off compilation time, execution time and code size. Right. This is what I want. I won't be at the GCC summit in Canada (I'm in San Diego then presenting some other work), but I'll make sure to announce our work when it's finished... Excellent. Looking forward to those results. Cool! -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste This message was sent using IMP, the Internet Messaging Program.
Re: GCC 4..4.x speed regression - help?
On Aug 16, 2009, at 18:02 , Martin Guy wrote: Yes, GCC is bigger and slower and for several architectures generates bigger, slower code with every release, though saying so won't make you very popular on this list! :) One theory is that there are now so many different optimization passes (and, worse, clever case-specific hacks hidden in the backends) that the interaction between the lot of them is now chaotic. Selecting optimization flags by hand is no longer humanly possible. That, and the fact that GCC supports so many (quite different) targets while keeping the set of optimization passes roughly equal across these targets probably also doesn't help... I'm aware there are (good) reasons to try and keep it that way, but it seems the reasons against it are getting stronger. Just my 2 cents. Kenneth PS: There's MILEPOST, but there are other projects too. Check out Acovea (http://www.coyotegulch.com/products/acovea) and my very own pet COLE (http://users.elis.ugent.be/~kehoste/, see the publications section). -- Kenneth Hoste Paris research group - ELIS - Ghent University, Belgium email: kenneth.ho...@elis.ugent.be website: http://www.elis.ugent.be/~kehoste blog: http://boegel.kejo.be
Re: gcc4.1.2 compilation errors
s of the Galerkin system *** 1 Error: Unclassifiable statement at (1) In file modules.f90:140 Real*8, Dimension(N11,N11) :: 1 Error: Syntax error in data declaration at (1) In file modules.f90:141 *POP, Poj1, Poj2, Poj3, Poj4 1 Error: Unclassifiable statement at (1) Any help will be greatly appreciated! -- Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra) Kenneth Hoste ELIS - Ghent University email: [EMAIL PROTECTED] blog: http://www.elis.ugent.be/~kehoste/blog website: http://www.elis.ugent.be/~kehoste
Re: Some 4.4 project musings
On 11 Feb 2008, at 14:49, Diego Novillo wrote: On Fri, Feb 1, 2008 at 3:55 PM, Andrew MacLeod <[EMAIL PROTECTED]> wrote: 1 - Pass cleanup. There have been rumblings about this, but I haven't seen a lot of actual progress. We currently run 100's of passes all the time, and we don't really know how effective or necessary many of them are. We also don't know whether the cleanups they run are truly doing much. It would be nice to do some actual analysis and do something with the results. Yes, this is an area that is in desperate need of TLC. Your plan looks good to me. We need to have a mechanism to determine whether a pass did something, we need to be able to have different pipelines for different architectures. Do you have anything specific in mind? Create a branch? Work directly on mainline? Perhaps the first stage would be to define the logging so we can start doing some analysis. I believe the work we did for CGO'08 is relevant here. We came up with an automated way to select combinations of passes which perform well as candidate optimization levels. We trade off compilation time vs execution time in the paper, but the framework we came up with is more general than that. We had some nice results, showing that using only 25% of the passes activated at -O3 were enough to come up with candidate optimization levels which significantly outperform -O1/2/3, especially in terms of compilation time. There is an overview of the key passes in the paper, along with a description of how we handled things... If you guys are interested, you can find the paper on my website (http://www.elis.ugent.be/~kehoste ), and I'll be happy to provide more info if needed, just shout. Or come and see my talk at CGO in April ;-) Kenneth -- Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra) Kenneth Hoste ELIS - Ghent University email: [EMAIL PROTECTED] blog: http://www.elis.ugent.be/~kehoste/blog website: http://www.elis.ugent.be/~kehoste
Re: GCC 4.2.0 Status Report (2007-05-11)
On 12 May 2007, at 00:02, Mark Mitchell wrote: Every time I think we're almost there with this release, I seem to manage to get stuck. :-( However, we're very close: the only PRs that I'm waiting for are: I admit this is not a blocking bug, but it seems fairly (very) easy to solve... I still have to figure out how the patch submission framework works (never contributed to an open-source project before), so I didn't get around to solve it myself. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31586 Maybe someone could solve this, so it is solved in GCC 4.2 and others? K. -- Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra) Kenneth Hoste ELIS - Ghent University email: [EMAIL PROTECTED] blog: http://www.elis.ugent.be/~kehoste/blog website: http://www.elis.ugent.be/~kehoste
Re: machine learning for loop unrolling
On 08 Jun 2007, at 16:31, Stefan Ciobaca wrote: Hello everyone, For my bachelor thesis I'm modifying gcc to use machine learning to predict the optimal unroll factor for different loops (inspired from this paper: http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS- TR-938.pdf). Interesting. I'm using evolutionary algorithms for similar purposes in my current research... Of course, not all of these are relevant to gcc. I'm looking at ways to compute some of these features, hopefully the most relevant ones. If there is already some work done that I could use in order to compute some of these features, I'd be glad if you could tell me about it. Also, if anyone can think of some useful features, related to the target architecture or the loops structure, I'd be glad to hear about them. I'm afraid I can't help here, I'm not familiar at all with GCCs internals. Also, I'm interested in some benchmarks. Many of the research papers that describe compiler optimizations use the SPEC* benchmarks, but these are not free, so I'm looking for alternatives. Currently I'm looking into: - OpenBench - Botan - CSiBE - Polyhedron (thanks to richi of #gcc for the last 3) Do you know any other one that would be better? But I can help here. Polyhedron is Fortran-only, but are well-suited for timing experiments (i.e. they run long enough to have reasonable running times, but aren't too long either). CSiBE is more targetted to code size, I believe the runtimes are ridicously small. I'm not familiar with the other two. Some other possibilities: * MiDataSets (also fairly small when run only once, but the suite allows you to adjust the outer loop iteration count to increase runtimes) [http://sourceforge.net/projects/midatasets] * MediaBench / MediaBench II: multimedia workloads, which typically iterate over frames for example [http://euler.slu.edu/~fritts/ mediabench/] * BioMetricsWorkload [http://www.ideal.ece.ufl.edu/main.php?action=bmw] * BioPerf: gene sequence analysis, ... [http://www.bioperf.org/] * some other benchmarks commonly used when testing GCC [http:// www.suse.de/~gcctest] I've been using the above with GCC and most work pretty well (on x86). Here is how I'm thinking of conducting the experiment: - for each innermost loop: - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and measure the time the benchmark takes - write down the loop features and the best unroll factor - apply some machine learning technique to the above data to determine the correlations between loop features and best unroll factor Any idea which? There's a huge number of different techniques out there, choosing an appropiate one is critical to success. - integrate the result into gcc and measure the benchmarks again When using machine learning techniques to build some kind of model, a common technique is crossvalidation. Say you have 20 benchmarks, no matter which ones. You use the larger part of those (for example 15) to build the model (i.e. determine the correlations between loop features and best unroll factor), and then test performance of that on the other ones. The important thing is not to use the benchmarks you test with when using the machine learning technique. That way, you can (hopefully) show that the stuff you've learned should work for other programs too. Do you think it is ok to only consider inner-most loops? What about the unroll factors? Should I consider bigger unroll factors? Do you think the above setup is ok? I'd say: don't be afraid to try too much. Try insane unroll factors too, just for testing purposes. You don't want to limit yourself to 32x when the optimal could be 64x for example. I welcome any feedback on this. Who is your advisor on this? Where are you doing your bachelors thesis? greetings, Kenneth -- Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra) Kenneth Hoste ELIS - Ghent University email: [EMAIL PROTECTED] blog: http://www.elis.ugent.be/~kehoste/blog website: http://www.elis.ugent.be/~kehoste