date:20061105

Re: compiling very large functions.

2006-11-05 Thread Dorit Nuzman

> I think that it is time that we in the GCC community took some time to
> address the problem of compiling very large functions in a somewhat
> systematic manner.
>
> GCC has two competing interests here:  it needs to be able to provide
> state of the art optimization for modest sized functions and it needs to
> be able to properly process very large machine generated functions using
> reasonable resources.
>
> I believe that the default behavior for the compiler should be that
> certain non essential passes be skipped if a very large function is
> encountered.
>
> There are two problems here:
>
> 1) defining the set of optimizations that need to be skipped.
> 2) defining the set of functions that trigger the special processing.
>
>
> For (1) I would propose that three measures be made of each function.
> These measures should be made before inlining occurs. The three measures
> are the number of variables, the number of statements, and the number of
> basic blocks.
> Many of the gcc passes are non linear in one or more of these measures
> and these passes should be skipped if one or more of these measures
> exceeds some threshold.
>

If profiling data is available at the point of decision, may want to take
into account also how hot is the function (have the threshold depend on the
size of the function vs. its expected impact on performance based on
profiling)

dorit

> For (2) I would propose that we add three new fields to the compilation
> manager.  These fields would be null or zero if the optimization is
> either essential or is only linear in the measure.  Otherwise, some
> indication of either a threshold or the exponent of the growth is used
> as the field.
>
> The compilation manager could then look at the options, in particular
> the -O level and perhaps some new options to indicate that this is a
> small machine or in the other extreme "optimize all functions come hell
> or high water!!" and skip those passes which will cause performance
> problems.
>
> I do not claim to understand how sensitive every pass is to these
> measures.  However, I could possibly make a good first cut on the rtl
> passes.
>
> However, I think that before anyone starts hacking anything, we should
> come to a consensus on an overall strategy and implement something
> consistent for the entire compiler rather than attack some particular
> pass in a manner that only gets us pass the next pr.
>
> Volunteer(s) to implement the compilation manager part of this would
> also be appreciated.
>
> Kenny
>
>

Re: compiling very large functions.

2006-11-05 Thread Paolo Bonzini



While I agree with you, I think that there are so many things we are 
already trying to address, that this one can wait.  I think we've been 
doing a very good job on large functions too, and I believe that authors 
of very large functions are just getting not only what they deserve, but 
actually what the expect: large compile times (superlinear).


Not too mention, that these huge functions are usually central to the 
program.  If GCC decided that it is not worth optimizing the 
machine-generated bytecode interpreter of GNU Smalltalk, for example, I 
might as well rewrite it in assembly (or as a JIT compiler).  Same for 
interpret.cc in libjava, though it is a tad smaller than GNU Smalltalk's 
interpreter.


Unlike the authors of other VM's, I have no problem writing code so that 
the *latest* version of GCC will do its best, instead of complaining 
that GCC compiles my code worse on every release.  So, I am ok with GCC 
doing stupid things because of bugs that I/we can fix, but not with GCC 
just giving up optimization on code that has always been compiled 
perfectly (in one/two minutes for about 30,000 lines of 
machine-generated code, despite being chock-full of computed gotos), 
that *can* be optimized very well, and that is central to the 
performance of a program.


Paolo

Induction variable optimization

2006-11-05 Thread h2005421

Respected Sir

I am a M.E.Computer science student and doing project on induction variable
optimization.

Therefore i am reading the file tree-ssa-loop-ivopts.c of gcc-4.0.2 to know
about what have implemented in that.

Is there any other way to know about what have implemented yet and in
gcc-4.0.2. Can i know the algorithm.? Please help me to figure it out.

Thanking you.


Rama Bharti Varshney
id: 2005h103421
Bits-Pilani(Rajasthan)
India.

Re: compiling very large functions.

2006-11-05 Thread Kenneth Zadeck

Paolo Bonzini wrote:
>
>> While I agree with you, I think that there are so many things we are
>> already trying to address, that this one can wait.  I think we've
>> been doing a very good job on large functions too, and I believe that
>> authors of very large functions are just getting not only what they
>> deserve, but actually what the expect: large compile times
>> (superlinear).
>
> Not too mention, that these huge functions are usually central to the
> program.  If GCC decided that it is not worth optimizing the
> machine-generated bytecode interpreter of GNU Smalltalk, for example,
> I might as well rewrite it in assembly (or as a JIT compiler).  Same
> for interpret.cc in libjava, though it is a tad smaller than GNU
> Smalltalk's interpreter.
>
> Unlike the authors of other VM's, I have no problem writing code so
> that the *latest* version of GCC will do its best, instead of
> complaining that GCC compiles my code worse on every release.  So, I
> am ok with GCC doing stupid things because of bugs that I/we can fix,
> but not with GCC just giving up optimization on code that has always
> been compiled perfectly (in one/two minutes for about 30,000 lines of
> machine-generated code, despite being chock-full of computed gotos),
> that *can* be optimized very well, and that is central to the
> performance of a program.
>
> Paolo
I actually think that you small talk example is the exception and not
the rule.  I would guess that the vast majority of very large functions
are machine generated simulations where the optimizer most likely
provides little benefit. 

In the case of dataflow, reaching defs is much more expensive than
simple liveness, not because there are more bit vectors (there are
exactly the same number) but because there are an order of magnitude
more bits in those bit vectors than in live variables and it just takes
more time and space to move them around. 

The thing is that even as memories get larger, something has to give. 
There are and will always be programs that are too large for the most
aggressive techniques and my proposal is simply a way to gracefully shed
the most expensive techniques as the programs get very large. 

The alternative is to just to just shelve these bugs and tell the
submitter not to use optimization on them.  I do not claim to know what
the right approach is.

kenny

Re: Induction variable optimization

2006-11-05 Thread Revital1 Eres



[EMAIL PROTECTED] wrote on 06/11/2006 03:34:27:

> Respected Sir
>
> I am a M.E.Computer science student and doing project on induction
variable
> optimization.
>
> Therefore i am reading the file tree-ssa-loop-ivopts.c of gcc-4.0.2 to
know
> about what have implemented in that.
>
> Is there any other way to know about what have implemented yet and in
> gcc-4.0.2. Can i know the algorithm.? Please help me to figure it out.
>
> Thanking you.
>
>
> Rama Bharti Varshney
> id: 2005h103421
> Bits-Pilani(Rajasthan)
> India.
>
>

Mabye "Induction variable analysis with delayed abstractions" paper by
Sebastian Pop, Georges-André Silber and Albert Cohen can help.

Re: compiling very large functions.

2006-11-05 Thread Andrew Haley

Kenneth Zadeck writes:
 > Paolo Bonzini wrote:
 > >
 > >> While I agree with you, I think that there are so many things we
 > >> are already trying to address, that this one can wait.  I think
 > >> we've been doing a very good job on large functions too, and I
 > >> believe that authors of very large functions are just getting
 > >> not only what they deserve, but actually what the expect: large
 > >> compile times (superlinear).
 > >
 > > Not too mention, that these huge functions are usually central to
 > > the program.  If GCC decided that it is not worth optimizing the
 > > machine-generated bytecode interpreter of GNU Smalltalk, for
 > > example, I might as well rewrite it in assembly (or as a JIT
 > > compiler).  Same for interpret.cc in libjava, though it is a tad
 > > smaller than GNU Smalltalk's interpreter.
 > >
 > > Unlike the authors of other VM's, I have no problem writing code
 > > so that the *latest* version of GCC will do its best, instead of
 > > complaining that GCC compiles my code worse on every release.
 > > So, I am ok with GCC doing stupid things because of bugs that
 > > I/we can fix, but not with GCC just giving up optimization on
 > > code that has always been compiled perfectly (in one/two minutes
 > > for about 30,000 lines of machine-generated code, despite being
 > > chock-full of computed gotos), that *can* be optimized very well,
 > > and that is central to the performance of a program.

 > I actually think that you small talk example is the exception and
 > not the rule.

Mmm, probably.  But these interpreters, while uncommon, are important
in the sense that they are executed a great deal.  Paolo is surely
right: we don't want to cease to optimize those important programs on
which we've done a good job in the past.

 > The thing is that even as memories get larger, something has to
 > give.  There are and will always be programs that are too large for
 > the most aggressive techniques and my proposal is simply a way to
 > gracefully shed the most expensive techniques as the programs get
 > very large.

Indeed, yes.

 > The alternative is to just to just shelve these bugs and tell the
 > submitter not to use optimization on them.

It's hard to argue with your reasoning.

Andrew.

Re: compiling very large functions.

2006-11-05 Thread Gabriel Dos Reis

Paolo Bonzini <[EMAIL PROTECTED]> writes:

| Kenneth Zadeck wrote:
| > I think that it is time that we in the GCC community took some time to
| > address the problem of compiling very large functions in a somewhat
| > systematic manner.
| 
| While I agree with you, I think that there are so many things we are
| already trying to address, that this one can wait.  I think we've been
| doing a very good job on large functions too, and I believe that
| authors of very large functions are just getting not only what they
| deserve, but actually what the expect: large compile times
| (superlinear).

You mean GCC developers? :-)

-- Gaby

Re: compiling very large functions.

2006-11-05 Thread Gabriel Dos Reis

Paolo Bonzini <[EMAIL PROTECTED]> writes:

[...]

| Unlike the authors of other VM's, I have no problem writing code so
| that the *latest* version of GCC will do its best, instead of
| complaining that GCC compiles my code worse on every release.  So, I

You gave some examples; those are not the only one that comes with
GCC. there are more examples (albeit less impressive) in GCC's own
source code and their logic is so that I don't believe people to
fiddle with them every release. 

-- Gaby

Re: 16 byte alignment hint for sse vectorization

2006-11-05 Thread Dorit Nuzman

Unfortunately there's no way to specify alignment attribute of pointers in
GCC - the syntax was allowed in the past but not really supported
correctly, and then entirely disallowed (by this patch
http://gcc.gnu.org/ml/gcc-patches/2005-04/msg02284.html). This issue was
discussed in details in these threads:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20794
http://gcc.gnu.org/ml/gcc/2005-03/msg00483.html
(and recently came up again also in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827#c56).
The problem is that
"We don't yet implement either attributes on array parameters applying to
the array not the pointer, or attributes inside the [] of the array
parameter applying to the pointer.  (This is documented in "Attribute
Syntax".)" (from the above thread).

dorit


> Hello,
>
> I have been playing with gcc's new (to me) auto vectorization
> optimizations. I have a particular loop for which I have made external
> provisions to ensure that the data is 16-byte aligned. I have tried
> everything I can think of to give gcc the hint that it is operating on
> aligned data, but still the vectorizer warns that it is operating on
> unaligned data and generates the less efficient MOVLPS/MOVUPS instead
> of MOVAPS.
>
> The code is like this:
>
> #define SSE __attribute__((aligned (16)))
>
> typedef float matrix_t[100][1024];
>
> matrix_t aa SSE, bb SSE, cc SSE;
>
> void calc(float *a, float *b, float *c) {
>   int i, n = 1024;
>
>   for (i=0; i a[i] = b[i] / (b[i] + c[i]);
>   }
>
> }
>
> int main(void) {
>   int i, n = 100;
>   for (i=0; i calc(a[i], b[i], c[i]);
>   }
> }
>
> gcc rejects if I specify alignment attributes on the formal parameters
> (obviously it was dumb to even try that), and there does not seem to
> be a way to get the alignment hint to apply to the object referenced
> by the pointer instead of the pointer itself.
>
> In my application it is important that the function doing the
> computations remains abstracted away from the data definitions, as
> there is over 1G of data dynamically arranged and the actual alignment
> provisions are made with mmap().
>
> Does anyone have a suggestion?
>
> Regards,
> Michael James

Re: compiling very large functions.

2006-11-05 Thread Robert Dewar


Kenneth Zadeck wrote:


I actually think that you small talk example is the exception and not
the rule.  I would guess that the vast majority of very large functions
are machine generated simulations where the optimizer most likely
provides little benefit.


Wouldn't it be best to base this discussion on data rather than guesses
as to what "most likely" happens. Seems to me that the first step here
is taking some of these large functions and compiling them with various
optimizations enabled or disabled.

Re: compiling very large functions.

2006-11-05 Thread Geert Bosch



On Nov 5, 2006, at 08:46, Kenneth Zadeck wrote:

The thing is that even as memories get larger, something has to give.
There are and will always be programs that are too large for the most
aggressive techniques and my proposal is simply a way to gracefully  
shed

the most expensive techniques as the programs get very large.

The alternative is to just to just shelve these bugs and tell the
submitter not to use optimization on them.  I do not claim to know  
what

the right approach is.


For compiling very large programs, lower optimization levels
(-O1 -Os) already tend to be competitive with -O2. More often
than not, -O3 does not improve results or even results in
slower code. Performance becomes dominated by how much code
can fit in L2, TLB or RAM.

Ideally, we would always compile code that is executed infrequently
using -Os to minimize memory footprint, and always compile code
in loops with many iterations using high optimization levels.

Kenneth's proposal to trigger different optimization strategies
in each function based on certain statistics seems an excellent
step to allow compilations to be more balanced. This does not
only help in reducing compile time (mostly through reduced
memory usage), but also may improve generated code.

For the rare program with huge performance-critical functions,
we can either add user-adjustable parameters, new optimization
levels, or use profile-feedback to find out about such
programs.

For most programs though, more balanced optimization allows
the compiler to aggressively optimize code that matters, and
minimizing code size and compile-time for large swaths of
code that are uninteresting.

  -Geert

Re: compiling very large functions.

2006-11-05 Thread Richard Guenther


On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:

Richard Guenther wrote:
> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> Richard Guenther wrote:
>> > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
>> >> I think that it is time that we in the GCC community took some
>> time to
>> >> address the problem of compiling very large functions in a somewhat
>> >> systematic manner.
>> >>
>> >> GCC has two competing interests here:  it needs to be able to provide
>> >> state of the art optimization for modest sized functions and it
>> needs to
>> >> be able to properly process very large machine generated functions
>> using
>> >> reasonable resources.
>> >>
>> >> I believe that the default behavior for the compiler should be that
>> >> certain non essential passes be skipped if a very large function is
>> >> encountered.
>> >>
>> >> There are two problems here:
>> >>
>> >> 1) defining the set of optimizations that need to be skipped.
>> >> 2) defining the set of functions that trigger the special processing.
>> >>
>> >>
>> >> For (1) I would propose that three measures be made of each function.
>> >> These measures should be made before inlining occurs. The three
>> measures
>> >> are the number of variables, the number of statements, and the
>> number of
>> >> basic blocks.
>> >
>> > Why before inlining?  These three numbers can change quite
>> significantly
>> > as a function passes through the pass pipeline.  So we should try
>> to keep
>> > them up-to-date to have an accurate measurement.
>> >
>> I am flexible here. We may want inlining to be able to update the
>> numbers.  However, I think that we should drive the inlining agression
>> based on these numbers.
>
> Well, for example jump threading and tail duplication can cause these
> numbers to significantly change.  Also CFG instrumentation and PRE
> can increase the BB count.  So we need to deal with cases where an
> optimization produces overly large number of basic blocks or
> instructions.
> (like by throtting those passes properly)
>
I lean to leave the numbers static even if they do increase as time goes
by.  Otherwise you get two effects, the first optimizations get to be
run more, and you get the wierd non linear step functions where small
changes in some upstream function effect the down stream.


Ok, I guess we can easily flag each function as having
- many BBs
- big BBs
- complex CFG (many edges)
and set these flags at CFG construction time during the lowering phase
(which is after the early inlining pass I believe).

The number of basic blocks is kept up-to-date during optimization, the
other numbers would need to be re-generated if we want to keep them
up-to-date.

But with just using three (or even only one?) flag, we can easily fit this
information in struct function.

I also like the idea of a "hot" function flag to be able to dynamically
switch betweed optimize_size and !optimize_size in the tree optimizers.
Profile based inlining already tries to follow that route.

Richard.

Re: compiling very large functions.

2006-11-05 Thread Steven Bosscher

On 11/5/06, Richard Guenther <[EMAIL PROTECTED]> wrote:

> I lean to leave the numbers static even if they do increase as time goes
> by.  Otherwise you get two effects, the first optimizations get to be
> run more, and you get the wierd non linear step functions where small
> changes in some upstream function effect the down stream.

Ok, I guess we can easily flag each function as having
 - many BBs
 - big BBs
 - complex CFG (many edges)
and set these flags at CFG construction time during the lowering phase
(which is after the early inlining pass I believe).

IMHO any CFG-based criteria should be using dynamic numbers, simply
because they are  available at all times. Large BBs is a more
interesting one, because in general they don't get smaller during
optimizations.

What Kenny suggests here is not new, BTW.  I know that gcse already
disables itself on very large functions (see
gcse.c:is_too_expensive()), and probably some other passes do this as
well. A grep for OPT_Wdisabled_optimization *should* show all the
places where we throttle or disable passes, but it appears that
warnings have not been added consistently when someone throttled a
pass.

AFAIK not one of the tree optimizers disables itself, but perhaps we
should. The obvious candidates would be the ones that require
recomputation of alias analysis, and the ones that don't update SSA
info on the fly (i.e. require update_ssa, which is a horrible compile
time hog).

Gr.
Steven

Re: compiling very large functions.

2006-11-05 Thread Jan Hubicka

> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
> >Richard Guenther wrote:
> >> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
> >>> Richard Guenther wrote:
> >>> > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
> >>> >> I think that it is time that we in the GCC community took some
> >>> time to
> >>> >> address the problem of compiling very large functions in a somewhat
> >>> >> systematic manner.
> >>> >>
> >>> >> GCC has two competing interests here:  it needs to be able to provide
> >>> >> state of the art optimization for modest sized functions and it
> >>> needs to
> >>> >> be able to properly process very large machine generated functions
> >>> using
> >>> >> reasonable resources.
> >>> >>
> >>> >> I believe that the default behavior for the compiler should be that
> >>> >> certain non essential passes be skipped if a very large function is
> >>> >> encountered.
> >>> >>
> >>> >> There are two problems here:
> >>> >>
> >>> >> 1) defining the set of optimizations that need to be skipped.
> >>> >> 2) defining the set of functions that trigger the special processing.
> >>> >>
> >>> >>
> >>> >> For (1) I would propose that three measures be made of each function.
> >>> >> These measures should be made before inlining occurs. The three
> >>> measures
> >>> >> are the number of variables, the number of statements, and the
> >>> number of
> >>> >> basic blocks.
> >>> >
> >>> > Why before inlining?  These three numbers can change quite
> >>> significantly
> >>> > as a function passes through the pass pipeline.  So we should try
> >>> to keep
> >>> > them up-to-date to have an accurate measurement.
> >>> >
> >>> I am flexible here. We may want inlining to be able to update the
> >>> numbers.  However, I think that we should drive the inlining agression
> >>> based on these numbers.
> >>
> >> Well, for example jump threading and tail duplication can cause these
> >> numbers to significantly change.  Also CFG instrumentation and PRE
> >> can increase the BB count.  So we need to deal with cases where an
> >> optimization produces overly large number of basic blocks or
> >> instructions.
> >> (like by throtting those passes properly)
> >>
> >I lean to leave the numbers static even if they do increase as time goes
> >by.  Otherwise you get two effects, the first optimizations get to be
> >run more, and you get the wierd non linear step functions where small
> >changes in some upstream function effect the down stream.
> 
> Ok, I guess we can easily flag each function as having
> - many BBs
> - big BBs
> - complex CFG (many edges)
> and set these flags at CFG construction time during the lowering phase
> (which is after the early inlining pass I believe).
> 
> The number of basic blocks is kept up-to-date during optimization, the
> other numbers would need to be re-generated if we want to keep them
> up-to-date.

We definitly need some heuristics to trottle down expensive
optimizations for very huge functions.  I am not sure I like the static
flag rather than per-case approach.  The idea of big versus small is
very different for individual optimizations and the properties change
significantly (ie expansion of min/max function will turn big BB into
big CFG) and it will be a lot of fun to set up proper thresholds here.

I also do believe that we need to take care to not have O(n^k) where K
is significantly higher than neccesary and that most of our current
scalability issues falls into this category and we have large complexity
just out of little laziness. We are implementing many of fancier
algorithms in order to be scalable so we should try to not undermine it
by stupid mistakes.

To take the specific example (that triggered this discussion) of extreme
testcase triggering on new fwprop pass, I think it is quite different.

Analyzing the testcase, there was number of problems that was trivial to
solve and are fixed in mainline now.  Remaining problems are usually
easy too - quadratic removal from single linked lists in out-of-SSA and
scheduler. I think those ought to be cured, since those don't hurt only
in such monstrosities.

On the other hand GVN-PRE/df.c do have algoritmic problems that we might
want or might not want to solve.  I care less if we add code to disable
the optimization or come with better algorithm and I am very happy
Daniel considers the second alternative for PRE.

The df.c issue can probably also be solved by either using FUD graph
build by our SSA algorithm over RTL, or turning the FUD graph into DU/UD
chain avoiding the need for the bitmaps. This might be interesting task
if such a big functions turn out to be common bottleneck.
> 
> But with just using three (or even only one?) flag, we can easily fit this
> information in struct function.
> 
> I also like the idea of a "hot" function flag to be able to dynamically
> switch betweed optimize_size and !optimize_size in the tree optimizers.
> Profile based inlining already tries to follow that route.

See cfun->function_f

Re: compiling very large functions.

2006-11-05 Thread Eric Botcazou

> AFAIK not one of the tree optimizers disables itself, but perhaps we
> should. The obvious candidates would be the ones that require
> recomputation of alias analysis, and the ones that don't update SSA
> info on the fly (i.e. require update_ssa, which is a horrible compile
> time hog).

Tree alias analysis can partially disable itself though:

  /* If the program has too many call-clobbered variables and/or function
 calls, create .GLOBAL_VAR and use it to model call-clobbering
 semantics at call sites.  This reduces the number of virtual operands
 considerably, improving compile times at the expense of lost
 aliasing precision.  */
  maybe_create_global_var (ai);

We have found this to be quite helpful on gigantic elaboration procedures 
generated for Ada packages instantiating gazillions of generics.  We have 
actually lowered the threshold locally.

-- 
Eric Botcazou

Re: compiling very large functions.

2006-11-05 Thread Kenneth Zadeck

Eric Botcazou wrote:
>> AFAIK not one of the tree optimizers disables itself, but perhaps we
>> should. The obvious candidates would be the ones that require
>> recomputation of alias analysis, and the ones that don't update SSA
>> info on the fly (i.e. require update_ssa, which is a horrible compile
>> time hog).
>> 
>
> Tree alias analysis can partially disable itself though:
>
>   /* If the program has too many call-clobbered variables and/or function
>  calls, create .GLOBAL_VAR and use it to model call-clobbering
>  semantics at call sites.  This reduces the number of virtual operands
>  considerably, improving compile times at the expense of lost
>  aliasing precision.  */
>   maybe_create_global_var (ai);
>
> We have found this to be quite helpful on gigantic elaboration procedures 
> generated for Ada packages instantiating gazillions of generics.  We have 
> actually lowered the threshold locally.
>
>   
I would like to point out that the central point of my proposal was to
have the compilation manager be the process that manages if an
optimization is skipped or not rather than having each pass make a
decision on it's own.  If we have a central mechanism, then it is
relative easy to find some sweet spots.  If every pass rolls its own, it
is more difficult to balance.


Kenny

Re: compiling very large functions.

2006-11-05 Thread Mark Mitchell


Paolo Bonzini wrote:

Kenneth Zadeck wrote:

I think that it is time that we in the GCC community took some time to
address the problem of compiling very large functions in a somewhat
systematic manner.


While I agree with you, I think that there are so many things we are 
already trying to address, that this one can wait. 


It certainly can, but I see no reason why it should.  This is a class of 
issues that users run into, and if someone is motivated to work on this 
class, then that's great!


I like Kenny's idea of having a uniform set of metrics for size (e.g., 
number of basic blocks, number of variables, etc.) and a limited set of 
gating functions because it will allow us to explain what's going on to 
users, and allow users to tune them.  For example, if the metric for 
disabling a pass (by default) is "# basic blocks > 10", then we can have 
a -foptimize-bigger=2 switch to change that to "20".  If the gating 
condition was instead some arbitrary computation, that would be harder 
to implement, and harder to explain.


Certainly, setting the default thresholds reasonably will be 
non-trivial.  If we can agree on the basic mechanism, though, we could 
add thresholding on a pass-by-pass basis.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713

Re: compiling very large functions.

2006-11-05 Thread Steven Bosscher

On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote:

> AFAIK not one of the tree optimizers disables itself, but perhaps we
> should. The obvious candidates would be the ones that require
> recomputation of alias analysis, and the ones that don't update SSA
> info on the fly (i.e. require update_ssa, which is a horrible compile
> time hog).

Tree alias analysis can partially disable itself though:

  /* If the program has too many call-clobbered variables and/or function
 calls, create .GLOBAL_VAR and use it to model call-clobbering
 semantics at call sites.  This reduces the number of virtual operands
 considerably, improving compile times at the expense of lost
 aliasing precision.  */
  maybe_create_global_var (ai);

We have found this to be quite helpful on gigantic elaboration procedures
generated for Ada packages instantiating gazillions of generics.  We have
actually lowered the threshold locally.

Heh, I believe you! :-)

IMHO we should add a OPT_Wdisabled_optimization warning there, though.

Gr.
Steven

Re: compiling very large functions.

2006-11-05 Thread Steven Bosscher


On 11/5/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:

I would like to point out that the central point of my proposal was to
have the compilation manager be the process that manages if an
optimization is skipped or not rather than having each pass make a
decision on it's own.  If we have a central mechanism, then it is
relative easy to find some sweet spots.  If every pass rolls its own, it
is more difficult to balance.


Hmm, I don't understand this.  Why is it harder to find a sweet spot
if every pass decides for itself whether to run or not?  I would think
that this decision should be made by each pass individually, because
the pass manager is one abstraction level higher where it shouldn't
have to know the behavior of each pass.

Gr.
Steven

Re: 16 byte alignment hint for sse vectorization

2006-11-05 Thread Mike Stump


On Nov 4, 2006, at 11:00 AM, Michael James wrote:

Does anyone have a suggestion?


#define SSE __attribute__((aligned (16)))

typedef float matrix_sub_t[1024] SSE;
typedef matrix_sub_t matrix_t[100];

matrix_t a, b, c;

void calc(matrix_sub_t * restrict ap,
  matrix_sub_t * restrict bp,
  matrix_sub_t * restrict cp) {
  int i, n = 1024;

  for (i=0; iwould seem the be one way to do this that should work.  Doesn't seem  
to vectorize for me, though, if I read that right, the alignment  
issue isn't the problem but rather the data dependancy analysis:


t.c:13: note: === vect_analyze_dependences ===
t.c:13: note: not vectorized: can't determine dependence between  
(*bp_12)[i_4] and (*ap_16)[i_4]

t.c:13: note: bad data dependence.

Re: compiling very large functions.

2006-11-05 Thread Chris Pickett


Steven Bosscher wrote:

On 11/5/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:

I would like to point out that the central point of my proposal was to
have the compilation manager be the process that manages if an
optimization is skipped or not rather than having each pass make a
decision on it's own.  If we have a central mechanism, then it is
relative easy to find some sweet spots.  If every pass rolls its own, it
is more difficult to balance.


Hmm, I don't understand this.  Why is it harder to find a sweet spot
if every pass decides for itself whether to run or not?  I would think
that this decision should be made by each pass individually, because
the pass manager is one abstraction level higher where it shouldn't
have to know the behavior of each pass.


If every expensive pass decides it is too expensive to run, then none of 
them will run, when what you might have wanted is for some but not all 
of them to run.  Being at a higher level means you can query a pass for 
a cost estimate given a certain set of arguments, and then decide 
whether or not to disable the pass based on what you know so far about 
the entire compilation chain.


Also, often passes will interact with each other, i.e. they are not 
orthogonal, and a good way to manage the effects is at a higher level, 
as opposed to having individual passes communicating with each other.


Chris

Re: Induction variable optimization

2006-11-05 Thread Zdenek Dvorak

Hello,

> I am a M.E.Computer science student and doing project on induction variable
> optimization.
> 
> Therefore i am reading the file tree-ssa-loop-ivopts.c of gcc-4.0.2 to know
> about what have implemented in that.
>
> Is there any other way to know about what have implemented yet and in
> gcc-4.0.2. Can i know the algorithm.?

the overview of the algorithm is in the comment at the start of
tree-ssa-loop-ivopts.c.

Zdenek

Re: compiling very large functions.

2006-11-05 Thread Daniel Berlin

On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote:

> AFAIK not one of the tree optimizers disables itself, but perhaps we
> should. The obvious candidates would be the ones that require
> recomputation of alias analysis, and the ones that don't update SSA
> info on the fly (i.e. require update_ssa, which is a horrible compile
> time hog).

Tree alias analysis can partially disable itself though:

No, it can't.  Tree alias representation can :)

it is also not really partially disabling. It's really fully disabling
in 99% of

  /* If the program has too many call-clobbered variables and/or function
 calls, create .GLOBAL_VAR and use it to model call-clobbering
 semantics at call sites.  This reduces the number of virtual operands
 considerably, improving compile times at the expense of lost
 aliasing precision.  */
  maybe_create_global_var (ai);

We have found this to be quite helpful on gigantic elaboration procedures
generated for Ada packages instantiating gazillions of generics.  We have
actually lowered the threshold locally.

As i alluded to above, this is disabling representation of accurate
call clobbering. It still performs the same analysis, it's just not
representing the results the same way.

(This is also another one of those things that makes other places
pretty hairy as a result)

This is in fact, a side-effect of the fact that we currently try to
represent aliasing information in terms of "variables things access"
instead of just saying "we know these things can touch the same part
of the heap".

IE we should be making memory equivalence classes and using those as
the symbols, instead of variables.  Otherwise, we end up saying "these
things can touch the same 30 variables" by listing all 30 variables in
vops, instead of just creating a single symbol that represents 30
variables and using this.

(Finding a good set of equivalence classes to use is, of course, a
Hard Problem(TM) , which is why we started by doing it this way).

Re: compiling very large functions.

2006-11-05 Thread Eric Botcazou

> > Tree alias analysis can partially disable itself though:
>
> No, it can't.  Tree alias representation can :)

I presume you're thinking of the pass that performs the analysis, while I was 
more thinking of the global machinery; my understanding is that the machinery 
will not be able to disambiguate memory accesses it would have been able to, 
if the limit were not reached.

> it is also not really partially disabling. It's really fully disabling
> in 99% of

Partially because it only affects call-clobbered variables IIUC.

-- 
Eric Botcazou

Re: 16 byte alignment hint for sse vectorization

2006-11-05 Thread Michael James

Hello Dorit,

Thank you for the list of references.

What I gathered from reading this is that alignment attributes applied
to the base element of an array are causing problems for other
legitimate uses of this attribute. It is basically stupid to specify
the base type of an array be aligned because this implies that the
array elements are scattered in memory. The only examples I can think
of where you might want to do that is with multidimensional arrays
where you want the start of each subarray to be aligned. That is
better solved just by changing the length of the inner array to one
that yields the desired alignment.

However, specifying pointer alignment (whether this hint is passed
through the type system or some other way) is an important piece of
information that a programmer may wish to communicate to the
optimizer. The obvious semantics for the hint are that the
log2(alignment) least significant bits of the pointer value are clear.
The size of the pointer base type would be unaffected so that pointer
math can still yield unaligned access.

I hope I have not oversimplified the issue at hand here. From the
point of view of someone passively benefiting from the optimizer, this
issue does not matter so much. However, someone who is performance
tuning his code may care very much about getting this information
through to the vectorizor and without some provision for a hint he is
stuck either (1) without the optimization or (2) rewriting the code so
that all aligned arrays are passed as pointer-to-struct with the array
element in the struct specified with an alignment attribute. I have
not tested method 2; it seems like a transformation which may work
despite being unaesthetic.

Regards,
Michael James

On 11/5/06, Dorit Nuzman <[EMAIL PROTECTED]> wrote:

Unfortunately there's no way to specify alignment attribute of pointers in
GCC - the syntax was allowed in the past but not really supported
correctly, and then entirely disallowed (by this patch
http://gcc.gnu.org/ml/gcc-patches/2005-04/msg02284.html). This issue was
discussed in details in these threads:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20794
http://gcc.gnu.org/ml/gcc/2005-03/msg00483.html
(and recently came up again also in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827#c56).
The problem is that
"We don't yet implement either attributes on array parameters applying to
the array not the pointer, or attributes inside the [] of the array
parameter applying to the pointer.  (This is documented in "Attribute
Syntax".)" (from the above thread).

dorit

> Hello,
>
> I have been playing with gcc's new (to me) auto vectorization
> optimizations. I have a particular loop for which I have made external
> provisions to ensure that the data is 16-byte aligned. I have tried
> everything I can think of to give gcc the hint that it is operating on
> aligned data, but still the vectorizer warns that it is operating on
> unaligned data and generates the less efficient MOVLPS/MOVUPS instead
> of MOVAPS.
>
> The code is like this:
>
> #define SSE __attribute__((aligned (16)))
>
> typedef float matrix_t[100][1024];
>
> matrix_t aa SSE, bb SSE, cc SSE;
>
> void calc(float *a, float *b, float *c) {
>   int i, n = 1024;
>
>   for (i=0; i a[i] = b[i] / (b[i] + c[i]);
>   }
>
> }
>
> int main(void) {
>   int i, n = 100;
>   for (i=0; i calc(a[i], b[i], c[i]);
>   }
> }
>
> gcc rejects if I specify alignment attributes on the formal parameters
> (obviously it was dumb to even try that), and there does not seem to
> be a way to get the alignment hint to apply to the object referenced
> by the pointer instead of the pointer itself.
>
> In my application it is important that the function doing the
> computations remains abstracted away from the data definitions, as
> there is over 1G of data dynamically arranged and the actual alignment
> provisions are made with mmap().
>
> Does anyone have a suggestion?
>
> Regards,
> Michael James

Re: compiling very large functions.

2006-11-05 Thread Daniel Berlin

On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote:

> > Tree alias analysis can partially disable itself though:
>
> No, it can't.  Tree alias representation can :)

I presume you're thinking of the pass that performs the analysis, while I was
more thinking of the global machinery; my understanding is that the machinery
will not be able to disambiguate memory accesses it would have been able to,
if the limit were not reached.

Depends on what you mean by "the machinery".  There is no standard API
to using the results of the analysis without using the representation
we provide, but some passes do it anyway through hackery, so yes and
no :)

> it is also not really partially disabling. It's really fully disabling
> in 99% of

Partially because it only affects call-clobbered variables IIUC.

It affects all variables that escape or are non-local, which is
roughly all variables that have interesting aliasing properties (IE
those that cannot be ascertained trivially).

Anyway, I would certainly not hold up what we do in alias
representation as a good example of proper throttling in the case of
large functions.

differences between dg-do compile and dg-do assemble

2006-11-05 Thread Manuel López-Ibáñez


Dear all,

Although I understand what is the difference between dg-do compile and
dg-do assemble, I have noticed that there are many testcases  that use
either dg-compile or dg-do assemble and do nothing with the output.
Thus, I would like to know:

Is it faster {dg-do compile} or {dg-do assemble} ?

Is it appropriate to always use the faster one if the testcase just
checks for the presence/absence of warnings and errors?

Cheers,

  Manuel.

Re: differences between dg-do compile and dg-do assemble

2006-11-05 Thread Ben Elliston

> Although I understand what is the difference between dg-do compile and
> dg-do assemble, I have noticed that there are many testcases  that use
> either dg-compile or dg-do assemble and do nothing with the output.
> Thus, I would like to know:
> 
> Is it faster {dg-do compile} or {dg-do assemble} ?

It's notionally faster to use `compile' (which will produce a .s file,
rather than assembling to an object file).  However, I benchmarked a
change to the test harness last year that skipped assembling test
cases when it was not necessary and found that it made such a tiny
improvement that it wasn't worth worrying about.

> Is it appropriate to always use the faster one if the testcase just
> checks for the presence/absence of warnings and errors?

The other side of the argument is that by using dg-do assemble, you'll
potentially find a class of compiler bugs that would go unnoticed if
you only compiled.

Cheers, Ben

Re: compiling very large functions

2006-11-05 Thread Bradley Lucier

The gcc developers have been very cooperative over the years in  
working to solve problems that I've had in compiling large machine- 
generated programs.  For example, when gcse was disabled for large  
flow flow graphs in 2000, the warn_disabled_optimization flag was  
added at my suggestion.  As Steven Bosscher hinted at, however, it  
appears that the disabled optimization warning has not been used at  
all beyond its introduction:


% grep -R disabled_optimization *
gcc/.svn/text-base/ChangeLog-2000.svn-base: * toplev.c  
(warn_disabled_optimization): Declare new warning flag.
gcc/.svn/text-base/ChangeLog-2000.svn-base: * flags.h  
(warn_disabled_optimization): Add it here.
gcc/.svn/text-base/common.opt.svn-base:Common Var 
(warn_disabled_optimization)
gcc/.svn/text-base/gcse.c.svn-base:  warning  
(OPT_Wdisabled_optimization,
gcc/.svn/text-base/gcse.c.svn-base:  warning  
(OPT_Wdisabled_optimization,
gcc/ChangeLog-2000: * toplev.c (warn_disabled_optimization):  
Declare new warning flag.
gcc/ChangeLog-2000: * flags.h (warn_disabled_optimization): Add  
it here.

gcc/common.opt:Common Var(warn_disabled_optimization)
gcc/gcse.c:  warning (OPT_Wdisabled_optimization,
gcc/gcse.c:  warning (OPT_Wdisabled_optimization,

A grep of 'PARAM_VALUE.*MAX' may give a more accurate idea of where  
optimization passes have been throttled for various reasons.


In reporting runtime problems, my experience has been that if there  
is one specific pass whose runtime overwhelms the runtime of all the  
other passes, then people respond quickly to deal with it.  I've  
tended to use a relatively low level of optimization in my programs (- 
O1 -fschedule-insns2 -fno-math-errno -fno-trapping-math -fwrapv -fno- 
strict-aliasing) and even as new tree-ssa passes were added to -O1,  
the runtime and space requirements were kept reasonable (perhaps  
after a bit of work).


Now I'm finding that more aggressive optimizations can significantly  
speed up these codes; I'm also finding that these newly-attempted  
optimization passes take up so much memory (especially on darwin G5  
for some reason) that they can't be applied except on very large  
machines.  (One of my codes took only six minutes, but over 10.1 GB  
of memory, to compile; smaller examples are given in PR 29374.)   
Perhaps these passes intrinsically require large amounts of memory,  
or perhaps the algorithms and data structures used have not yet been  
examined critically for very large programs.  When I find time I plan  
to discover which specific optimizations require large memory;  
perhaps with this data the passes involved can be studied more  
closely to see whether the memory requirements can be cut.


With that background, I'd like to request that optimization passes  
not be throttled back by default, especially in stage 1 of a release  
cycle.  I fear that this would unduly hide problems that might be  
solved with a reasonable amount of effort.  It's not clear to me how  
many new optimization passes and data structures have been stress- 
tested on very large programs; we may find that most problems that  
appear now can be fixed if they are examined in isolation with the  
right input data.


Brad

Where is the splitting of MIPS %hi and %lo relocations handled?

2006-11-05 Thread David Daney


I am going to try to fix:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29721

Which is a problem where a %lo relocation gets separated from its 
corresponding %hi.


What is the mechanism that tries to prevent this from happening?  And 
where is it implemented?


Thanks,
David Daney

Abt long long support

2006-11-05 Thread Mohamed Shafi


Hello all,

Looking at a .md file of a backend it there a way to know whether a
target supports long long

Should i look for patterns with machine mode DI?
Is there some other way?

Thanks in advance for the help.

Regards,
Shafi

Re: compiling very large functions.

Re: compiling very large functions.

Induction variable optimization

Re: compiling very large functions.

Re: Induction variable optimization

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: 16 byte alignment hint for sse vectorization

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: compiling very large functions.

Re: 16 byte alignment hint for sse vectorization

Re: compiling very large functions.

Re: Induction variable optimization

Re: compiling very large functions.

Re: compiling very large functions.

Re: 16 byte alignment hint for sse vectorization

Re: compiling very large functions.

differences between dg-do compile and dg-do assemble

Re: differences between dg-do compile and dg-do assemble

Re: compiling very large functions

Where is the splitting of MIPS %hi and %lo relocations handled?

Abt long long support

31 matches

Site Navigation

Mail list logo

Footer information