which opt. flags go where? - references

2007-02-07 Thread Kenneth Hoste

Hello,

I'm planning to do some research on the optimization flags available  
for GCC (currently, using 4.1.1). More in particular, we want to see  
how we can come up with a set of combinations of flags which allow a  
tradeoff between compilation time, execution time and code size (as  
with -O1, -O2, -O3, -Os). Off course, we don't want to do an  
exhaustive search of all possible combinations of flags, because that  
would be totally unfeasible (using the 56 flags enabled in -O3 for  
gcc 4.1.1 yields ~72*10^15 (= 2^56-1) possible candidates).


It seems there has already been some work done on this subject, or  
atleast that's what richi on #gcc (OFTC) told me. He wasn't able to  
refer me to work in that area though. I have found some references  
myself (partially listed below), but I'm hoping people more familiar  
with the GCC community can help expand this list.


[1] Almagor et al., Finding effective compilation sequences (LCES'04)
[2] Cooper et al., Optimizing for Reduced Code Space using Genetic  
Algorithms (LCTES'99)
[3] Almagor et al., Compilation Order Matters: Exploring the  
Structure of the Space of Compilation Sequences Using Randomized  
Search Algorithms  (Tech.Report)
[3] Acovea: Using Natural Selection to Investigate Software  
Complexities (http://www.coyotegulch.com/products/acovea/)


Some other questions:

* I'm planning to do this work on an x86 platform (i.e. Pentium4),  
but richi told me that's probably not a good idea, because of the low  
number of registers available on x86. Comments?


* Since we have done quite some analysis on the SPEC2k benchmarks,  
we'll also be using them for this work. Other suggestions are highly  
appreciated.


* Since there has been some previous work on this, I wonder why none  
of it has made it into GCC development. Were the methods proposed  
unfeasible for some reason? What would be needed to make an approach  
to automatically find suitable flags for -Ox interesting enough to  
incorporate it into GCC? Any references to this previous work?


greetings,

Kenneth Hoste
Paris, ELIS, Ghent University (Belgium)

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste




Re: which opt. flags go where? - references

2007-02-07 Thread Kenneth Hoste

Hi,

On 07 Feb 2007, at 15:22, Diego Novillo wrote:


Kenneth Hoste wrote on 02/07/07 08:56:


[1] Almagor et al., Finding effective compilation sequences (LCES'04)
[2] Cooper et al., Optimizing for Reduced Code Space using  
Genetic  Algorithms (LCTES'99)
[3] Almagor et al., Compilation Order Matters: Exploring the   
Structure of the Space of Compilation Sequences Using Randomized   
Search Algorithms  (Tech.Report)
[3] Acovea: Using Natural Selection to Investigate Software   
Complexities (http://www.coyotegulch.com/products/acovea/)
You should also contact Ben Elliston (CC'd) and Grigori Fursin  
(sorry, no email).


Ben worked on dynamic reordering of passes, his thesis will have  
more information about it.


Grigori is working on an API for iterative an adaptive  
optimization, implemented in GCC.  He presented at the last HiPEAC  
2007 GCC workshop.  Their presentation should be

available at http://www.hipeac.net/node/746



I actually talked to Grigori about the -Ox flags, I was at the HiPEAC  
conference too ;-) I didn't include references to his work, because  
my aim wouldn't be at reordering of passes, but just selecting them.  
I understand that reordering is of great importance while optimizing,  
but I think this project is big enough as is.



Some other questions:
* I'm planning to do this work on an x86 platform (i.e.  
Pentium4),  but richi told me that's probably not a good idea,  
because of the low  number of registers available on x86. Comments?
When deriving ideal flag combinations for -Ox, we will probably  
want common sets for the more popular architectures, so I would  
definitely include x86.


OK. I think richi's comment on x86 was the fact that evaluating the  
technique we are thinking about might produce results which are hard  
to 'port' to a different architecture. But then again, we won't be  
stating we have found _the_ best set of flags for a given goal...  
Thank you for your comment.




* Since we have done quite some analysis on the SPEC2k  
benchmarks,  we'll also be using them for this work. Other  
suggestions are highly  appreciated.
We have a collection of tests from several user communities that we  
use as performance benchmarks (DLV, TRAMP3D, MICO).  There should  
be links to the testers somewhere in http://gcc.gnu.org/


OK, sounds interesting, I'll look into it. In which way are these  
benchmarks used? Just to test the general performance of GCC? Have  
they been compared to say, SPEC CPU, or other 'research/industrial'  
benchmark suites (such as MiBench, MediaBench, EEMBC, ...) ?




* Since there has been some previous work on this, I wonder why  
none  of it has made it into GCC development. Were the methods  
proposed  unfeasible for some reason? What would be needed to make  
an approach  to automatically find suitable flags for -Ox  
interesting enough to  incorporate it into GCC? Any references to  
this previous work?
It's one of the things I would like to see implemented in GCC in  
the near future.  I've been chatting with Ben and Grigori about  
their work and it would be a great idea if we could discuss this at  
the next GCC Summit.  I'm hoping someone will propose a BoF about it.


I'm hoping my ideas will lead to significant results, because I think  
this is an important issue.


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste




Re: which opt. flags go where? - references

2007-02-08 Thread Kenneth Hoste


On 08 Feb 2007, at 10:47, Ronny Peine wrote:


Hi,

maybe http://docs.lib.purdue.edu/ecetr/123/ would also be  
interesting for you.
There, a quadratic algorithm for finding a nearly optimal set of  
compiler
flags is described. The results are quite promising and i have also  
tested it

on my own benchmarkingsuite with good results.



Thank you very much. After reading the abstract, I'm highly  
interested in this work, because they also use GCC and SPEC CPU2000,  
as I'm planning to do...


Which benchmarks did you test on?

greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste




GCC -On optimization passes: flag and doc issues

2007-04-17 Thread Kenneth Hoste

Hiya,

I'm doing research on which optimization passes to enable in the  
various -On flags, and I've stumbled onto a/some minor bug(s) and  
problems with the GCC documentation for the 4.1.2 version:


* When using -falign-loops or -fno-align-loops the corresponding  
internal variable 'align-loops' should be set to 0 (= use default  
setting) or 1 (= no aligning) resp. When parsing the various flags, a  
variable 'value' is used to set (value=1) or unset (value=0) the  
corresponding flag. Unfortunately, these values should be reversed  
for the -falign-X flags, which isn't done, and thus using -falign- 
loops causes no aligning, while -fno-align-loops causes aligning with  
the default setting for the arch you're compiling for. Same problem  
for -falign-jumps, -falign-labels and -falign-functions.


* On x86, -fschedule-insns is disabled, but -fschedule-insns2 (or the  
corresponding internal flag flag_schedule_insns_after_reload) is  
still being used... The reason for disabling fschedule-insns is  
increased register pressure (and x86 has few registers already). I'm  
not sure if this a bug, but why disable -fschedule-insns and leave - 
fschedule-insns2 active? Because , according to the docs, fschedule- 
insns2 = fschedule-insns + an additional pass. Any comments on this  
are welcome...


I've file a bug report for the first issue (http://gcc.gnu.org/ 
bugzilla/show_bug.cgi?id=31586), but since this is easy to fix, I'd  
like to fix it myself. The only thing holding me back is that I'm new  
at this: I've never contributed to an open source project before, and  
I'm only vaguely familiar with CVS and the likes. Is there a newbie- 
guide to contributing to GCC? How can I make sure this fix is done  
for 4.1, 4.2 _and_ 4.3?


Also, while trying to identify the flags enabled in the various -On  
flags, I've run into incomplete or even wrong documentation regarding  
the optimization passes. Some examples:


- the -fipa-X flags are not mentioned in the 4.1.2 documentation
- funit-at-a-time is still being reported as enabled in -O2 and  
above, while it is already active in -O1 (this seems to fixed in the  
4.3.0 docs though)
- the documentation for fmove-loop-invariants and ftree-loop-optimize  
mentions they are enabled at -O1 when they are not

- finline-functions is enabled at -Os, but isn't listed so

I can imagine few people care about the exact flags enabled at -On,  
but why list wrong information which will only confuse people (as it  
did for me).


Last but not least, I have some questions:

- Some flags are enabled by other flags (for example, fschedule-insns  
enables fsched-interblock and fsched-spec according to the docs). I  
was unable to find where in the source code this is done... And  
because I want to incorporate as much active On-flags as possible,  
I'd like to track down for which flags this is really true, and for  
which the docs are out-of-date.


- When new optimization passes are completed, how is decided where  
they go to (-O1, -O2, -O3, -Os, none)? For example, why did funit-at- 
a-time switch from -O2 to -O1 a while ago? And, as I noticed from the  
4.3.0 docs, why is fsplit-wide-types enabled at -O1, and not -O2? Is  
there some testing done, or is it just people saying "I think X  
belongs there"?


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


Re: GCC -On optimization passes: flag and doc issues

2007-04-17 Thread Kenneth Hoste


On 17 Apr 2007, at 18:18, Eric Weddington wrote:





-Original Message-
From: Kenneth Hoste [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 17, 2007 3:23 AM
To: gcc@gcc.gnu.org
Subject: GCC -On optimization passes: flag and doc issues



- finline-functions is enabled at -Os, but isn't listed so


And it seems to have some issues:
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31528>
Comments #4 and #6.


This is exactly one of the reasons for our research: providing a  
usefull way to determine which flags should go where...
Of course, this still depends on the benchmarks used and the platform  
chosen, but still, the way it is done now it's just guessing IMHO (of  
course, using domain knowledge, but still)...


greetings,

K.

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste




Re: Segfault on OpenMP program

2007-04-18 Thread Kenneth Hoste


On 18 Apr 2007, at 15:38, Paul Brook wrote:


On Wednesday 18 April 2007 00:19, FX Coudert wrote:

Someone reported on bug on a trivial statically-linked Fortran progam
with OpenMP and a I/O operation. I can reproduce the segfault, which
happens at:
...
Andrew suggested on bugzilla that this might be a GLIBC issue (I use
glibc-2.4 from redhat), with "pthreads not linking in correctly".
Does this ring a bell? Can someone confirm that it's not a GCC issue
(or that it is)? Details can be found in PR 31604.


The answer to any question involving glibc and static linking is
generally "don't do that".

I've seen pthread related problems with static linking, though that  
was

related to nss dlopening a shared library from a static application.


Is this only advised with parallel programs, or is it a general rule?
In my research, I statically link all my benchmarks because that I  
measure stuff about them using instrumentation on a bunch of computer  
(which might have various OSs, OS versions, libc versions, ...).  
Because we want to analyze the same dynamic executed code over and  
over again (no matter on which machine the analysis is donne), I use  
static linking.


I've heard to warnings about static linking before from other people  
too: using a libc version which is not intented for the system you  
are running on might lead to weird behavior.


Concerning this, I have some questions:

- What are the possible issues with static linking? Are they listed  
above (pthread problems, libc vs kernel conflicts), or are there more?


- How can I easily dynamically link against libc, but statically link  
with everything else? Currently, making everything statically linked  
is easy: in the various makefiles I set CC="gcc -static". Is there a  
similar 'trick' to link libc dynamically but everything else  
statically? (I'm using GCC 4.1.2 if it matters)


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste




Re: GCC mini-summit - benchmarks

2007-04-20 Thread Kenneth Hoste


On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote:


11) H.J. Lu discussed SPEC CPU 2006.  He reported that a couple of the
tests do not run successfully, and it appears to be due to bugs in
the tests which cause gcc to compile them in unexpected ways.  He
has been reporting the problems to the SPEC committee, but is
being ignored.  He encouraged other people to try it themselves
and make their own reports.



I'm not sure what 'tests' mean here... Are test cases being extracted  
from the SPEC CPU2006 sources? Or are you refering to the validity  
tests of the SPEC framework itself (to check whether the output  
generated by some binary conforms with their reference output)?




12) Jose Dana reported his results comparing different versions of gcc
and icc at CERN.  They have a lot of performance sensitive C++
code, and have gathered a large number of standalone snippets
which they use for performance comparisons.  Some of these can
directly become bugzilla enhancement requests.  In general this
could ideally be turned into a free performance testsuite.  All
the code is under free software licenses.


Is this performance testsuite available somewhere? Sounds interesting  
to add to my (long) list of benchmark suites.


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


Re: GCC mini-summit - compiling for a particular architecture

2007-04-20 Thread Kenneth Hoste


On 20 Apr 2007, at 08:30, Ian Lance Taylor wrote:


13) Michael Meissner raised the idea of compiling functions
differently for different processors, choosing the version based
on a runtime decision.  This led to some discussion of how this
could be done effectively.  In particular if there is an
architectural difference, such as Altivec, you may get prologue
instructions which save and restore registers which are not
present on all architectures.



Related to this: have you guys ever considered to making the -On  
flags dependent on the architecture?
Now, the decision which flags are activated for the -On flags is  
independent of the architecture I believe (except for flags which  
need to be disabled to ensure correct code generation, such as - 
fschedule-insns for x86). I must say I haven't looked into this in  
great detail, but atleast for the passes controlled by flags on x86,  
this seems to be the case.


I think choosing the flags in function of the architecture you are  
compiling for, might be highly beneficial (see http://gcc.gnu.org/ 
bugzilla/show_bug.cgi?id=31528 for example).


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


Re: GCC -On optimization passes: flag and doc issues

2007-04-20 Thread Kenneth Hoste


On 17 Apr 2007, at 16:27, Ian Lance Taylor wrote:


Kenneth Hoste <[EMAIL PROTECTED]> writes:


* When using -falign-loops or -fno-align-loops the corresponding
internal variable 'align-loops' should be set to 0 (= use default
setting) or 1 (= no aligning) resp. When parsing the various flags, a
variable 'value' is used to set (value=1) or unset (value=0) the
corresponding flag. Unfortunately, these values should be reversed
for the -falign-X flags, which isn't done, and thus using -falign-
loops causes no aligning, while -fno-align-loops causes aligning with
the default setting for the arch you're compiling for. Same problem
for -falign-jumps, -falign-labels and -falign-functions.


Ouch.  Thanks for noticing.


I hope I can fix this my self (should be fairly easy), once I get  
familiar with submitting patches.


A related question: how is decided which priority a bug gets?




Also, while trying to identify the flags enabled in the various -On
flags, I've run into incomplete or even wrong documentation regarding
the optimization passes. Some examples:

- the -fipa-X flags are not mentioned in the 4.1.2 documentation


Looks like a bug.


A documentation bug you mean?




- funit-at-a-time is still being reported as enabled in -O2 and
above, while it is already active in -O1 (this seems to fixed in the
4.3.0 docs though)


This would be a bug but it looks OK to me in 4.1.


Yes, you're right, sorry. It seems this was fixed in the 4.1.2 release.




- the documentation for fmove-loop-invariants and ftree-loop-optimize
mentions they are enabled at -O1 when they are not


Actually I think they are enabled at -O1.  This is done in a slightly
tricky way: they default to being on, but they have no effect unless
the other loop optimization passes are run.  I think it would be
appropriate to clean this up to make the code more obvious.


Hmm, okay, thank you. Do you know of any other flags being activated  
like this in a 'hidden' way?





- finline-functions is enabled at -Os, but isn't listed so


Yes, this should be documented better.  The general -finline-functions
is not enabled at -Os.  Only inlining of small functions is enabled.
This is done by setting parameters.


Yes, I've noticed the setting of the parameters. But this should  
definitely be reflected in the documentation.





- Some flags are enabled by other flags (for example, fschedule-insns
enables fsched-interblock and fsched-spec according to the docs). I
was unable to find where in the source code this is done... And
because I want to incorporate as much active On-flags as possible,
I'd like to track down for which flags this is really true, and for
which the docs are out-of-date.


-fsched-interblock and -fsched-spec default to 1 in common.opt.  But
they have no effect unless you enable a scheduling pass.


Another example of 'hidden' flag-activation. Any more?




- When new optimization passes are completed, how is decided where
they go to (-O1, -O2, -O3, -Os, none)? For example, why did funit-at-
a-time switch from -O2 to -O1 a while ago? And, as I noticed from the
4.3.0 docs, why is fsplit-wide-types enabled at -O1, and not -O2? Is
there some testing done, or is it just people saying "I think X
belongs there"?


Where to put optimizations is a trade-off between the amount of time
they take and the amount of good they do.  The testing which is done
is compilation time testing.


Hmm, okay. I still feel this is a bit ad-hoc. What about flags that  
don't get activated in any of the -On flags? Are these not stable  
enough yet on important architectures, or are there other reasons?


greetings,

K.

--

Statistics are like a bikini. What they reveal is suggestive, but  
what they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

Citeren "Kaveh R. GHAZI" <[EMAIL PROTECTED]>:


On Mon, 23 Apr 2007, Mark Mitchell wrote:


I'm certainly not trying to suggest that we run SPEC on every
architecture, and then make -O2 be the set of optimization options that
happens to do best there, however bizarre.


Why not?  Is your objection because SPEC doesn't reflect real-world apps
or because the option set might be "bizarre"?

The bizarreness of the resulting flag set should not be a consideration
IMHO.  Humans are bad at predicting how complex systems like this behave.
The fact that the "best" flags may be non-intuitive is not surprising. I
find the case of picking compiler options for optimizing code very much
like choosing which part of your code to hand optimize programmatically.
People often guess wrongly where their code spends its time, that's why we
have profilers.

So I feel we should choose our per-target flags using a tool like Acovea
to find what the "best" options are on a per-target basis.  Then we could
insert those flags into -O2 or "-Om" in each backend.  Whether we use SPEC
or the testcases included in Acovea or maybe GCC itself as the app to tune
for could be argued.  And some care would be necessary to ensure that the
resulting flags don't completely hose other large classes of apps.  But
IMHO once you decide to do per-target flags, something like this seems
like the natural conclusion.

http://www.coyotegulch.com/products/acovea/



I totally agree with you, Acovea would a good tool for helping with this.
But, in my opinion, it has one big downside: it doesn't allow a  
tradeoff between for example compilation time and execution time.


My work tries to tackle this: I'm using an evolutionary approach (like  
Acovea) which is multi-objective (unlike Acovea), meaning it optimizes  
a Pareto curve for comp. time, execution time and code size. I'm also  
trying to speed up things over the way Acovea handles things, which I  
won't describe any further for now.


I'm currently only using the SPEC CPU2000 benchmarks (and I believe  
Acovea does too), but this shouldn't be a drawback: the methodology is  
completely independent of the benchmarks used. I'm also trying to make  
sure everything is parameterizable (size of population, number of  
generations, crossover/mutation/migration rates, ...).


Unfortunately, I'm having quite a lot of problems with weird  
combinations of flags (which generate bugs, as was mentioned in this  
thread), which is slowing things down. Instead of trying to fix these  
bugs, I'm just ignoring combinations of flags which generate them for  
now (although I'l try to report each bug I run into in Bugzilla).


Hopefully, this work will produce nice results by June, and I'll make  
sure to report on it on this mailinglist once it's done.


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but what  
they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste





This message was sent using IMP, the Internet Messaging Program.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

Citeren Diego Novillo <[EMAIL PROTECTED]>:


Mark Mitchell wrote on 04/23/07 13:56:


So, I think there's a middle ground between "exactly the same passes on
all targets" and "use Acovea for every CPU to pick what -O2 means".
Using Acovea to reveal some of the suprising, but beneficial results,
seems like a fine idea, though.


I'm hoping to hear something along those lines at the next GCC Summit. I
have heard of a bunch of work in academia doing extensive optimization
space searches looking for combinations of pass sequencing and
repetition to achieve optimal results.

My naive idea is for someone to test all these different combinations
and give us a set of -Ox recipes that we can use by default in the compiler.




Sorry to be blunt, but that's indeed quite naive :-)

Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations.
If you also kind the passes not controlled by a flag, but decided upon  
depending on the optimization level, that adds another, virtual flag  
(i.e. using -O1, -O2, -O3 or -Os as base setting).


A serious set of programs can easily take tens of minutes, so I think  
you can easily see trying all possible combinations is totally  
unfeasible. The nice thing is you don't have to: you can learn from  
the combinations you've evaluated, you can start with only a subset of  
programs and add others gradually, ...


My work is actually concentrating on building a framework to do  
exactly that: give a set of recipes for -On flags which allow a  
choice, and which are determined by trading off compilation time,  
execution time and code size.


I won't be at the GCC summit in Canada (I'm in San Diego then  
presenting some other work), but I'll make sure to announce our work  
when it's finished...


greetings,

Kenneth

--

Statistics are like a bikini. What they reveal is suggestive, but what  
they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


This message was sent using IMP, the Internet Messaging Program.


RE: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

Citeren Dave Korn <[EMAIL PROTECTED]>:


On 23 April 2007 19:07, Diego Novillo wrote:


Mark Mitchell wrote on 04/23/07 13:56:


So, I think there's a middle ground between "exactly the same passes on
all targets" and "use Acovea for every CPU to pick what -O2 means".
Using Acovea to reveal some of the suprising, but beneficial results,
seems like a fine idea, though.


I'm hoping to hear something along those lines at the next GCC Summit. I
have heard of a bunch of work in academia doing extensive optimization
space searches looking for combinations of pass sequencing and
repetition to achieve optimal results.

My naive idea is for someone to test all these different combinations
and give us a set of -Ox recipes that we can use by default in the compiler.



  Has any of the Acovea research demonstrated whether there actually is any
such thing as a "good default set of flags in all cases"?  If the results
obtained diverge significantly according to the nature/coding
style/architecture/other uncontrolled variable factors of the application, we
may be labouring under a false premise wrt. the entire idea, mightn't we?


I don't think that has been shown. Acovea evaluation has only been  
done on a few architectures, and I don't believe a comparison was made.


But, you guys are probably the best team to try that out: you have  
access to a wide range of platforms, and the experience needed to  
solve problems when they present them. Hopefully, my upcoming  
framework will boost such an effort. Remember: adjusting the way in  
which GCC handles -On flags is only needed if the tests suggest it  
will be usefull.


greetings,

Kenneth




This message was sent using IMP, the Internet Messaging Program.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

Citeren Diego Novillo <[EMAIL PROTECTED]>:


Dave Korn wrote on 04/23/07 14:26:


  Has any of the Acovea research demonstrated whether there actually is any
such thing as a "good default set of flags in all cases"?  If the results


Not Acovea itself.  The research I'm talking about involves a compiler
whose pipeline can be modified and resequenced.  It's not just a matter
of adding -f/-m flags.  The research I've seen does a fairly good job at
modelling AI systems that traverse the immense search space looking for
different sequences.



Any references?

--

Statistics are like a bikini. What they reveal is suggestive, but what  
they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste




This message was sent using IMP, the Internet Messaging Program.


Re: GCC mini-summit - compiling for a particular architecture

2007-04-23 Thread Kenneth . Hoste

On 23 Apr 2007, at 20:43, Diego Novillo wrote:

[EMAIL PROTECTED] wrote on 04/23/07 14:37:

Currently, the -On flags set/unset 60 flags, which yields 2^60 conbinations.
If you also kind the passes not controlled by a flag, but decided upon
depending on the optimization level, that adds another, virtual flag
(i.e. using -O1, -O2, -O3 or -Os as base setting).

No, that's not what I want.  I want a static recipe.  I do *not* want
-Ox to do this search every time.

I'm not saying you need to do this every time you install GCC. I'm  
saying trying out 2^60 combinations (even offline) is totally  
unfeasible.



It goes like this: Somebody does a study over a set of applications that
represent certain usage patterns (say, FP and INT just to mention the
two more common classes of apps).  The slow search is done offline and
after a few months, we get the results in the form of a table that says
for each class and for each -Ox what set of passes to execute and in
what order they should be executed.

Not to say that the current sequencing and repetition are worthless, but
 I think they could be improved in a quasi systematic way using this
process (which is slow and painful, I know).

Exactly my idea.



My work is actually concentrating on building a framework to do
exactly that: give a set of recipes for -On flags which allow a
choice, and which are determined by trading off compilation time,
execution time and code size.

Right.  This is what I want.

I won't be at the GCC summit in Canada (I'm in San Diego then
presenting some other work), but I'll make sure to announce our work
when it's finished...

Excellent.  Looking forward to those results.

Cool!

--

Statistics are like a bikini. What they reveal is suggestive, but what  
they conceal is vital (Aaron Levenstein)


Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


This message was sent using IMP, the Internet Messaging Program.


Re: GCC 4..4.x speed regression - help?

2009-08-17 Thread Kenneth Hoste


On Aug 16, 2009, at 18:02 , Martin Guy wrote:


Yes, GCC is bigger and slower and for several architectures generates
bigger, slower code with every release, though saying so won't make
you very popular on this list! :)

One theory is that there are now so many different optimization passes
(and, worse, clever case-specific hacks hidden in the backends) that
the interaction between the lot of them is now chaotic. Selecting
optimization flags by hand is no longer humanly possible.


That, and the fact that GCC supports so many (quite different) targets
while keeping the set of optimization passes roughly equal across
these targets probably also doesn't help...

I'm aware there are (good) reasons to try and keep it that way, but it  
seems

the reasons against it are getting stronger.

Just my 2 cents.

Kenneth

PS: There's MILEPOST, but there are other projects too. Check out Acovea
(http://www.coyotegulch.com/products/acovea) and my very own
pet COLE (http://users.elis.ugent.be/~kehoste/, see the publications  
section).


--

Kenneth Hoste
Paris research group - ELIS - Ghent University, Belgium
email: kenneth.ho...@elis.ugent.be
website: http://www.elis.ugent.be/~kehoste
blog: http://boegel.kejo.be



Re: gcc4.1.2 compilation errors

2008-11-30 Thread Kenneth Hoste
s of the Galerkin system  
***

1
Error: Unclassifiable statement at (1)
In file modules.f90:140

   Real*8,  Dimension(N11,N11) ::
1
Error: Syntax error in data declaration at (1)
In file modules.f90:141

  *POP, Poj1, Poj2, Poj3, Poj4
 1
Error: Unclassifiable statement at (1)

Any help will be greatly appreciated!




--

Computer Science is no more about computers than astronomy is about  
telescopes. (E. W. Dijkstra)


Kenneth Hoste
ELIS - Ghent University
email: [EMAIL PROTECTED]
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste



Re: Some 4.4 project musings

2008-02-11 Thread Kenneth Hoste


On 11 Feb 2008, at 14:49, Diego Novillo wrote:

On Fri, Feb 1, 2008 at 3:55 PM, Andrew MacLeod <[EMAIL PROTECTED]>  
wrote:


1 - Pass cleanup.  There have been rumblings about this, but I  
haven't
seen a lot of actual progress. We currently run 100's of passes all  
the
time, and we don't really know how effective or necessary many of  
them

are. We also don't know whether the cleanups they run are truly doing
much.  It would be nice to do some actual analysis and do something  
with

the results.


Yes, this is an area that is in desperate need of TLC.  Your plan
looks good to me.  We need to have a mechanism to determine whether a
pass did something, we need to be able to have different pipelines for
different architectures.

Do you have anything specific in mind?  Create a branch?  Work
directly on mainline?

Perhaps the first stage would be to define the logging so we can start
doing some analysis.


I believe the work we did for CGO'08 is relevant here. We came up with  
an automated way to select combinations of passes which perform well  
as candidate optimization levels. We trade off compilation time vs  
execution time in the paper, but the framework we came up with is more  
general than that.
We had some nice results, showing that using only 25% of the passes  
activated at -O3 were enough to come up with candidate optimization  
levels which significantly outperform -O1/2/3, especially in terms of  
compilation time. There is an overview of the key passes in the paper,  
along with a description of how we handled things...


If you guys are interested, you can find the paper on my website (http://www.elis.ugent.be/~kehoste 
), and I'll be happy to provide more info if needed, just shout. Or  
come and see my talk at CGO in April ;-)


Kenneth

--

Computer Science is no more about computers than astronomy is about  
telescopes. (E. W. Dijkstra)


Kenneth Hoste
ELIS - Ghent University
email: [EMAIL PROTECTED]
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste


Re: GCC 4.2.0 Status Report (2007-05-11)

2007-05-12 Thread Kenneth Hoste


On 12 May 2007, at 00:02, Mark Mitchell wrote:


Every time I think we're almost there with this release, I seem to
manage to get stuck. :-(  However, we're very close: the only PRs that
I'm waiting for are:



I admit this is not a blocking bug, but it seems fairly (very) easy  
to solve... I still have to figure out how the patch submission  
framework works (never contributed to an open-source project before),  
so I didn't get around to solve it myself.


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31586

Maybe someone could solve this, so it is solved in GCC 4.2 and others?

K.

--

Computer Science is no more about computers than astronomy is about  
telescopes. (E. W. Dijkstra)


Kenneth Hoste
ELIS - Ghent University
email: [EMAIL PROTECTED]
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste




Re: machine learning for loop unrolling

2007-06-08 Thread Kenneth Hoste


On 08 Jun 2007, at 16:31, Stefan Ciobaca wrote:


Hello everyone,

For my bachelor thesis I'm modifying gcc to use machine learning to
predict the optimal unroll factor for different loops (inspired from
this paper: http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS- 
TR-938.pdf).


Interesting. I'm using evolutionary algorithms for similar purposes  
in my current research...





Of course, not all of these are relevant to gcc. I'm looking at ways
to compute some of these features, hopefully the most relevant ones.
If there is already some work done that I could use in order to
compute some of these features, I'd be glad if you could tell me about
it. Also, if anyone can think of some useful features, related to the
target architecture or the loops structure, I'd be glad to hear about
them.


I'm afraid I can't help here, I'm not familiar at all with GCCs  
internals.




Also, I'm interested in some benchmarks. Many of the research papers
that describe compiler optimizations use the SPEC* benchmarks, but
these are not free, so I'm looking for alternatives. Currently I'm
looking into:

- OpenBench
- Botan
- CSiBE
- Polyhedron

(thanks to richi of #gcc for the last 3)

Do you know any other one that would be better?


But I can help here. Polyhedron is Fortran-only, but are well-suited  
for timing experiments (i.e. they run long enough to have reasonable  
running times, but aren't too long either).
CSiBE is more targetted to code size, I believe the runtimes are  
ridicously small. I'm not familiar with the other two.

Some other possibilities:

* MiDataSets (also fairly small when run only once, but the suite  
allows you to adjust the outer loop iteration count to increase  
runtimes) [http://sourceforge.net/projects/midatasets]
* MediaBench / MediaBench II: multimedia workloads, which typically  
iterate over frames for example [http://euler.slu.edu/~fritts/ 
mediabench/]

* BioMetricsWorkload [http://www.ideal.ece.ufl.edu/main.php?action=bmw]
* BioPerf: gene sequence analysis, ... [http://www.bioperf.org/]
* some other benchmarks commonly used when testing GCC [http:// 
www.suse.de/~gcctest]


I've been using the above with GCC and most work pretty well (on x86).



Here is how I'm thinking of conducting the experiment:

- for each innermost loop:
  - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and
measure the time the benchmark takes
  - write down the loop features and the best unroll factor
- apply some machine learning technique to the above data to determine
the correlations between loop features and best unroll factor


Any idea which? There's a huge number of different techniques out  
there, choosing an appropiate one is critical to success.



- integrate the result into gcc and measure the benchmarks again


When using machine learning techniques to build some kind of model, a  
common technique is crossvalidation.
Say you have 20 benchmarks, no matter which ones. You use the larger  
part of those (for example 15) to build the model (i.e. determine the  
correlations between loop features and best unroll factor), and then  
test performance of that on the other ones. The important thing is  
not to use the benchmarks you test with when using the machine  
learning technique. That way, you can (hopefully) show that the stuff  
you've learned should work for other programs too.




Do you think it is ok to only consider inner-most loops? What about
the unroll factors? Should I consider bigger unroll factors? Do you
think the above setup is ok?


I'd say: don't be afraid to try too much. Try insane unroll factors  
too, just for testing purposes. You don't want to limit yourself to  
32x when the optimal could be 64x for example.




I welcome any feedback on this.


Who is your advisor on this? Where are you doing your bachelors thesis?

greetings,

Kenneth

--

Computer Science is no more about computers than astronomy is about  
telescopes. (E. W. Dijkstra)


Kenneth Hoste
ELIS - Ghent University
email: [EMAIL PROTECTED]
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste