Re: compiling very large functions.
> I think that it is time that we in the GCC community took some time to > address the problem of compiling very large functions in a somewhat > systematic manner. > > GCC has two competing interests here: it needs to be able to provide > state of the art optimization for modest sized functions and it needs to > be able to properly process very large machine generated functions using > reasonable resources. > > I believe that the default behavior for the compiler should be that > certain non essential passes be skipped if a very large function is > encountered. > > There are two problems here: > > 1) defining the set of optimizations that need to be skipped. > 2) defining the set of functions that trigger the special processing. > > > For (1) I would propose that three measures be made of each function. > These measures should be made before inlining occurs. The three measures > are the number of variables, the number of statements, and the number of > basic blocks. > Many of the gcc passes are non linear in one or more of these measures > and these passes should be skipped if one or more of these measures > exceeds some threshold. > If profiling data is available at the point of decision, may want to take into account also how hot is the function (have the threshold depend on the size of the function vs. its expected impact on performance based on profiling) dorit > For (2) I would propose that we add three new fields to the compilation > manager. These fields would be null or zero if the optimization is > either essential or is only linear in the measure. Otherwise, some > indication of either a threshold or the exponent of the growth is used > as the field. > > The compilation manager could then look at the options, in particular > the -O level and perhaps some new options to indicate that this is a > small machine or in the other extreme "optimize all functions come hell > or high water!!" and skip those passes which will cause performance > problems. > > I do not claim to understand how sensitive every pass is to these > measures. However, I could possibly make a good first cut on the rtl > passes. > > However, I think that before anyone starts hacking anything, we should > come to a consensus on an overall strategy and implement something > consistent for the entire compiler rather than attack some particular > pass in a manner that only gets us pass the next pr. > > Volunteer(s) to implement the compilation manager part of this would > also be appreciated. > > Kenny > >
Re: compiling very large functions.
While I agree with you, I think that there are so many things we are already trying to address, that this one can wait. I think we've been doing a very good job on large functions too, and I believe that authors of very large functions are just getting not only what they deserve, but actually what the expect: large compile times (superlinear). Not too mention, that these huge functions are usually central to the program. If GCC decided that it is not worth optimizing the machine-generated bytecode interpreter of GNU Smalltalk, for example, I might as well rewrite it in assembly (or as a JIT compiler). Same for interpret.cc in libjava, though it is a tad smaller than GNU Smalltalk's interpreter. Unlike the authors of other VM's, I have no problem writing code so that the *latest* version of GCC will do its best, instead of complaining that GCC compiles my code worse on every release. So, I am ok with GCC doing stupid things because of bugs that I/we can fix, but not with GCC just giving up optimization on code that has always been compiled perfectly (in one/two minutes for about 30,000 lines of machine-generated code, despite being chock-full of computed gotos), that *can* be optimized very well, and that is central to the performance of a program. Paolo
Induction variable optimization
Respected Sir I am a M.E.Computer science student and doing project on induction variable optimization. Therefore i am reading the file tree-ssa-loop-ivopts.c of gcc-4.0.2 to know about what have implemented in that. Is there any other way to know about what have implemented yet and in gcc-4.0.2. Can i know the algorithm.? Please help me to figure it out. Thanking you. Rama Bharti Varshney id: 2005h103421 Bits-Pilani(Rajasthan) India.
Re: compiling very large functions.
Paolo Bonzini wrote: > >> While I agree with you, I think that there are so many things we are >> already trying to address, that this one can wait. I think we've >> been doing a very good job on large functions too, and I believe that >> authors of very large functions are just getting not only what they >> deserve, but actually what the expect: large compile times >> (superlinear). > > Not too mention, that these huge functions are usually central to the > program. If GCC decided that it is not worth optimizing the > machine-generated bytecode interpreter of GNU Smalltalk, for example, > I might as well rewrite it in assembly (or as a JIT compiler). Same > for interpret.cc in libjava, though it is a tad smaller than GNU > Smalltalk's interpreter. > > Unlike the authors of other VM's, I have no problem writing code so > that the *latest* version of GCC will do its best, instead of > complaining that GCC compiles my code worse on every release. So, I > am ok with GCC doing stupid things because of bugs that I/we can fix, > but not with GCC just giving up optimization on code that has always > been compiled perfectly (in one/two minutes for about 30,000 lines of > machine-generated code, despite being chock-full of computed gotos), > that *can* be optimized very well, and that is central to the > performance of a program. > > Paolo I actually think that you small talk example is the exception and not the rule. I would guess that the vast majority of very large functions are machine generated simulations where the optimizer most likely provides little benefit. In the case of dataflow, reaching defs is much more expensive than simple liveness, not because there are more bit vectors (there are exactly the same number) but because there are an order of magnitude more bits in those bit vectors than in live variables and it just takes more time and space to move them around. The thing is that even as memories get larger, something has to give. There are and will always be programs that are too large for the most aggressive techniques and my proposal is simply a way to gracefully shed the most expensive techniques as the programs get very large. The alternative is to just to just shelve these bugs and tell the submitter not to use optimization on them. I do not claim to know what the right approach is. kenny
Re: Induction variable optimization
[EMAIL PROTECTED] wrote on 06/11/2006 03:34:27: > Respected Sir > > I am a M.E.Computer science student and doing project on induction variable > optimization. > > Therefore i am reading the file tree-ssa-loop-ivopts.c of gcc-4.0.2 to know > about what have implemented in that. > > Is there any other way to know about what have implemented yet and in > gcc-4.0.2. Can i know the algorithm.? Please help me to figure it out. > > Thanking you. > > > Rama Bharti Varshney > id: 2005h103421 > Bits-Pilani(Rajasthan) > India. > > Mabye "Induction variable analysis with delayed abstractions" paper by Sebastian Pop, Georges-André Silber and Albert Cohen can help.
Re: compiling very large functions.
Kenneth Zadeck writes: > Paolo Bonzini wrote: > > > >> While I agree with you, I think that there are so many things we > >> are already trying to address, that this one can wait. I think > >> we've been doing a very good job on large functions too, and I > >> believe that authors of very large functions are just getting > >> not only what they deserve, but actually what the expect: large > >> compile times (superlinear). > > > > Not too mention, that these huge functions are usually central to > > the program. If GCC decided that it is not worth optimizing the > > machine-generated bytecode interpreter of GNU Smalltalk, for > > example, I might as well rewrite it in assembly (or as a JIT > > compiler). Same for interpret.cc in libjava, though it is a tad > > smaller than GNU Smalltalk's interpreter. > > > > Unlike the authors of other VM's, I have no problem writing code > > so that the *latest* version of GCC will do its best, instead of > > complaining that GCC compiles my code worse on every release. > > So, I am ok with GCC doing stupid things because of bugs that > > I/we can fix, but not with GCC just giving up optimization on > > code that has always been compiled perfectly (in one/two minutes > > for about 30,000 lines of machine-generated code, despite being > > chock-full of computed gotos), that *can* be optimized very well, > > and that is central to the performance of a program. > I actually think that you small talk example is the exception and > not the rule. Mmm, probably. But these interpreters, while uncommon, are important in the sense that they are executed a great deal. Paolo is surely right: we don't want to cease to optimize those important programs on which we've done a good job in the past. > The thing is that even as memories get larger, something has to > give. There are and will always be programs that are too large for > the most aggressive techniques and my proposal is simply a way to > gracefully shed the most expensive techniques as the programs get > very large. Indeed, yes. > The alternative is to just to just shelve these bugs and tell the > submitter not to use optimization on them. It's hard to argue with your reasoning. Andrew.
Re: compiling very large functions.
Paolo Bonzini <[EMAIL PROTECTED]> writes: | Kenneth Zadeck wrote: | > I think that it is time that we in the GCC community took some time to | > address the problem of compiling very large functions in a somewhat | > systematic manner. | | While I agree with you, I think that there are so many things we are | already trying to address, that this one can wait. I think we've been | doing a very good job on large functions too, and I believe that | authors of very large functions are just getting not only what they | deserve, but actually what the expect: large compile times | (superlinear). You mean GCC developers? :-) -- Gaby
Re: compiling very large functions.
Paolo Bonzini <[EMAIL PROTECTED]> writes: [...] | Unlike the authors of other VM's, I have no problem writing code so | that the *latest* version of GCC will do its best, instead of | complaining that GCC compiles my code worse on every release. So, I You gave some examples; those are not the only one that comes with GCC. there are more examples (albeit less impressive) in GCC's own source code and their logic is so that I don't believe people to fiddle with them every release. -- Gaby
Re: 16 byte alignment hint for sse vectorization
Unfortunately there's no way to specify alignment attribute of pointers in GCC - the syntax was allowed in the past but not really supported correctly, and then entirely disallowed (by this patch http://gcc.gnu.org/ml/gcc-patches/2005-04/msg02284.html). This issue was discussed in details in these threads: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20794 http://gcc.gnu.org/ml/gcc/2005-03/msg00483.html (and recently came up again also in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827#c56). The problem is that "We don't yet implement either attributes on array parameters applying to the array not the pointer, or attributes inside the [] of the array parameter applying to the pointer. (This is documented in "Attribute Syntax".)" (from the above thread). dorit > Hello, > > I have been playing with gcc's new (to me) auto vectorization > optimizations. I have a particular loop for which I have made external > provisions to ensure that the data is 16-byte aligned. I have tried > everything I can think of to give gcc the hint that it is operating on > aligned data, but still the vectorizer warns that it is operating on > unaligned data and generates the less efficient MOVLPS/MOVUPS instead > of MOVAPS. > > The code is like this: > > #define SSE __attribute__((aligned (16))) > > typedef float matrix_t[100][1024]; > > matrix_t aa SSE, bb SSE, cc SSE; > > void calc(float *a, float *b, float *c) { > int i, n = 1024; > > for (i=0; i a[i] = b[i] / (b[i] + c[i]); > } > > } > > int main(void) { > int i, n = 100; > for (i=0; i calc(a[i], b[i], c[i]); > } > } > > gcc rejects if I specify alignment attributes on the formal parameters > (obviously it was dumb to even try that), and there does not seem to > be a way to get the alignment hint to apply to the object referenced > by the pointer instead of the pointer itself. > > In my application it is important that the function doing the > computations remains abstracted away from the data definitions, as > there is over 1G of data dynamically arranged and the actual alignment > provisions are made with mmap(). > > Does anyone have a suggestion? > > Regards, > Michael James
Re: compiling very large functions.
Kenneth Zadeck wrote: I actually think that you small talk example is the exception and not the rule. I would guess that the vast majority of very large functions are machine generated simulations where the optimizer most likely provides little benefit. Wouldn't it be best to base this discussion on data rather than guesses as to what "most likely" happens. Seems to me that the first step here is taking some of these large functions and compiling them with various optimizations enabled or disabled.
Re: compiling very large functions.
On Nov 5, 2006, at 08:46, Kenneth Zadeck wrote: The thing is that even as memories get larger, something has to give. There are and will always be programs that are too large for the most aggressive techniques and my proposal is simply a way to gracefully shed the most expensive techniques as the programs get very large. The alternative is to just to just shelve these bugs and tell the submitter not to use optimization on them. I do not claim to know what the right approach is. For compiling very large programs, lower optimization levels (-O1 -Os) already tend to be competitive with -O2. More often than not, -O3 does not improve results or even results in slower code. Performance becomes dominated by how much code can fit in L2, TLB or RAM. Ideally, we would always compile code that is executed infrequently using -Os to minimize memory footprint, and always compile code in loops with many iterations using high optimization levels. Kenneth's proposal to trigger different optimization strategies in each function based on certain statistics seems an excellent step to allow compilations to be more balanced. This does not only help in reducing compile time (mostly through reduced memory usage), but also may improve generated code. For the rare program with huge performance-critical functions, we can either add user-adjustable parameters, new optimization levels, or use profile-feedback to find out about such programs. For most programs though, more balanced optimization allows the compiler to aggressively optimize code that matters, and minimizing code size and compile-time for large swaths of code that are uninteresting. -Geert
Re: compiling very large functions.
On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: Richard Guenther wrote: > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: >> Richard Guenther wrote: >> > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: >> >> I think that it is time that we in the GCC community took some >> time to >> >> address the problem of compiling very large functions in a somewhat >> >> systematic manner. >> >> >> >> GCC has two competing interests here: it needs to be able to provide >> >> state of the art optimization for modest sized functions and it >> needs to >> >> be able to properly process very large machine generated functions >> using >> >> reasonable resources. >> >> >> >> I believe that the default behavior for the compiler should be that >> >> certain non essential passes be skipped if a very large function is >> >> encountered. >> >> >> >> There are two problems here: >> >> >> >> 1) defining the set of optimizations that need to be skipped. >> >> 2) defining the set of functions that trigger the special processing. >> >> >> >> >> >> For (1) I would propose that three measures be made of each function. >> >> These measures should be made before inlining occurs. The three >> measures >> >> are the number of variables, the number of statements, and the >> number of >> >> basic blocks. >> > >> > Why before inlining? These three numbers can change quite >> significantly >> > as a function passes through the pass pipeline. So we should try >> to keep >> > them up-to-date to have an accurate measurement. >> > >> I am flexible here. We may want inlining to be able to update the >> numbers. However, I think that we should drive the inlining agression >> based on these numbers. > > Well, for example jump threading and tail duplication can cause these > numbers to significantly change. Also CFG instrumentation and PRE > can increase the BB count. So we need to deal with cases where an > optimization produces overly large number of basic blocks or > instructions. > (like by throtting those passes properly) > I lean to leave the numbers static even if they do increase as time goes by. Otherwise you get two effects, the first optimizations get to be run more, and you get the wierd non linear step functions where small changes in some upstream function effect the down stream. Ok, I guess we can easily flag each function as having - many BBs - big BBs - complex CFG (many edges) and set these flags at CFG construction time during the lowering phase (which is after the early inlining pass I believe). The number of basic blocks is kept up-to-date during optimization, the other numbers would need to be re-generated if we want to keep them up-to-date. But with just using three (or even only one?) flag, we can easily fit this information in struct function. I also like the idea of a "hot" function flag to be able to dynamically switch betweed optimize_size and !optimize_size in the tree optimizers. Profile based inlining already tries to follow that route. Richard.
Re: compiling very large functions.
On 11/5/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > I lean to leave the numbers static even if they do increase as time goes > by. Otherwise you get two effects, the first optimizations get to be > run more, and you get the wierd non linear step functions where small > changes in some upstream function effect the down stream. Ok, I guess we can easily flag each function as having - many BBs - big BBs - complex CFG (many edges) and set these flags at CFG construction time during the lowering phase (which is after the early inlining pass I believe). IMHO any CFG-based criteria should be using dynamic numbers, simply because they are available at all times. Large BBs is a more interesting one, because in general they don't get smaller during optimizations. What Kenny suggests here is not new, BTW. I know that gcse already disables itself on very large functions (see gcse.c:is_too_expensive()), and probably some other passes do this as well. A grep for OPT_Wdisabled_optimization *should* show all the places where we throttle or disable passes, but it appears that warnings have not been added consistently when someone throttled a pass. AFAIK not one of the tree optimizers disables itself, but perhaps we should. The obvious candidates would be the ones that require recomputation of alias analysis, and the ones that don't update SSA info on the fly (i.e. require update_ssa, which is a horrible compile time hog). Gr. Steven
Re: compiling very large functions.
> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: > >Richard Guenther wrote: > >> On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: > >>> Richard Guenther wrote: > >>> > On 11/4/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: > >>> >> I think that it is time that we in the GCC community took some > >>> time to > >>> >> address the problem of compiling very large functions in a somewhat > >>> >> systematic manner. > >>> >> > >>> >> GCC has two competing interests here: it needs to be able to provide > >>> >> state of the art optimization for modest sized functions and it > >>> needs to > >>> >> be able to properly process very large machine generated functions > >>> using > >>> >> reasonable resources. > >>> >> > >>> >> I believe that the default behavior for the compiler should be that > >>> >> certain non essential passes be skipped if a very large function is > >>> >> encountered. > >>> >> > >>> >> There are two problems here: > >>> >> > >>> >> 1) defining the set of optimizations that need to be skipped. > >>> >> 2) defining the set of functions that trigger the special processing. > >>> >> > >>> >> > >>> >> For (1) I would propose that three measures be made of each function. > >>> >> These measures should be made before inlining occurs. The three > >>> measures > >>> >> are the number of variables, the number of statements, and the > >>> number of > >>> >> basic blocks. > >>> > > >>> > Why before inlining? These three numbers can change quite > >>> significantly > >>> > as a function passes through the pass pipeline. So we should try > >>> to keep > >>> > them up-to-date to have an accurate measurement. > >>> > > >>> I am flexible here. We may want inlining to be able to update the > >>> numbers. However, I think that we should drive the inlining agression > >>> based on these numbers. > >> > >> Well, for example jump threading and tail duplication can cause these > >> numbers to significantly change. Also CFG instrumentation and PRE > >> can increase the BB count. So we need to deal with cases where an > >> optimization produces overly large number of basic blocks or > >> instructions. > >> (like by throtting those passes properly) > >> > >I lean to leave the numbers static even if they do increase as time goes > >by. Otherwise you get two effects, the first optimizations get to be > >run more, and you get the wierd non linear step functions where small > >changes in some upstream function effect the down stream. > > Ok, I guess we can easily flag each function as having > - many BBs > - big BBs > - complex CFG (many edges) > and set these flags at CFG construction time during the lowering phase > (which is after the early inlining pass I believe). > > The number of basic blocks is kept up-to-date during optimization, the > other numbers would need to be re-generated if we want to keep them > up-to-date. We definitly need some heuristics to trottle down expensive optimizations for very huge functions. I am not sure I like the static flag rather than per-case approach. The idea of big versus small is very different for individual optimizations and the properties change significantly (ie expansion of min/max function will turn big BB into big CFG) and it will be a lot of fun to set up proper thresholds here. I also do believe that we need to take care to not have O(n^k) where K is significantly higher than neccesary and that most of our current scalability issues falls into this category and we have large complexity just out of little laziness. We are implementing many of fancier algorithms in order to be scalable so we should try to not undermine it by stupid mistakes. To take the specific example (that triggered this discussion) of extreme testcase triggering on new fwprop pass, I think it is quite different. Analyzing the testcase, there was number of problems that was trivial to solve and are fixed in mainline now. Remaining problems are usually easy too - quadratic removal from single linked lists in out-of-SSA and scheduler. I think those ought to be cured, since those don't hurt only in such monstrosities. On the other hand GVN-PRE/df.c do have algoritmic problems that we might want or might not want to solve. I care less if we add code to disable the optimization or come with better algorithm and I am very happy Daniel considers the second alternative for PRE. The df.c issue can probably also be solved by either using FUD graph build by our SSA algorithm over RTL, or turning the FUD graph into DU/UD chain avoiding the need for the bitmaps. This might be interesting task if such a big functions turn out to be common bottleneck. > > But with just using three (or even only one?) flag, we can easily fit this > information in struct function. > > I also like the idea of a "hot" function flag to be able to dynamically > switch betweed optimize_size and !optimize_size in the tree optimizers. > Profile based inlining already tries to follow that route. See cfun->function_f
Re: compiling very large functions.
> AFAIK not one of the tree optimizers disables itself, but perhaps we > should. The obvious candidates would be the ones that require > recomputation of alias analysis, and the ones that don't update SSA > info on the fly (i.e. require update_ssa, which is a horrible compile > time hog). Tree alias analysis can partially disable itself though: /* If the program has too many call-clobbered variables and/or function calls, create .GLOBAL_VAR and use it to model call-clobbering semantics at call sites. This reduces the number of virtual operands considerably, improving compile times at the expense of lost aliasing precision. */ maybe_create_global_var (ai); We have found this to be quite helpful on gigantic elaboration procedures generated for Ada packages instantiating gazillions of generics. We have actually lowered the threshold locally. -- Eric Botcazou
Re: compiling very large functions.
Eric Botcazou wrote: >> AFAIK not one of the tree optimizers disables itself, but perhaps we >> should. The obvious candidates would be the ones that require >> recomputation of alias analysis, and the ones that don't update SSA >> info on the fly (i.e. require update_ssa, which is a horrible compile >> time hog). >> > > Tree alias analysis can partially disable itself though: > > /* If the program has too many call-clobbered variables and/or function > calls, create .GLOBAL_VAR and use it to model call-clobbering > semantics at call sites. This reduces the number of virtual operands > considerably, improving compile times at the expense of lost > aliasing precision. */ > maybe_create_global_var (ai); > > We have found this to be quite helpful on gigantic elaboration procedures > generated for Ada packages instantiating gazillions of generics. We have > actually lowered the threshold locally. > > I would like to point out that the central point of my proposal was to have the compilation manager be the process that manages if an optimization is skipped or not rather than having each pass make a decision on it's own. If we have a central mechanism, then it is relative easy to find some sweet spots. If every pass rolls its own, it is more difficult to balance. Kenny
Re: compiling very large functions.
Paolo Bonzini wrote: Kenneth Zadeck wrote: I think that it is time that we in the GCC community took some time to address the problem of compiling very large functions in a somewhat systematic manner. While I agree with you, I think that there are so many things we are already trying to address, that this one can wait. It certainly can, but I see no reason why it should. This is a class of issues that users run into, and if someone is motivated to work on this class, then that's great! I like Kenny's idea of having a uniform set of metrics for size (e.g., number of basic blocks, number of variables, etc.) and a limited set of gating functions because it will allow us to explain what's going on to users, and allow users to tune them. For example, if the metric for disabling a pass (by default) is "# basic blocks > 10", then we can have a -foptimize-bigger=2 switch to change that to "20". If the gating condition was instead some arbitrary computation, that would be harder to implement, and harder to explain. Certainly, setting the default thresholds reasonably will be non-trivial. If we can agree on the basic mechanism, though, we could add thresholding on a pass-by-pass basis. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: compiling very large functions.
On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote: > AFAIK not one of the tree optimizers disables itself, but perhaps we > should. The obvious candidates would be the ones that require > recomputation of alias analysis, and the ones that don't update SSA > info on the fly (i.e. require update_ssa, which is a horrible compile > time hog). Tree alias analysis can partially disable itself though: /* If the program has too many call-clobbered variables and/or function calls, create .GLOBAL_VAR and use it to model call-clobbering semantics at call sites. This reduces the number of virtual operands considerably, improving compile times at the expense of lost aliasing precision. */ maybe_create_global_var (ai); We have found this to be quite helpful on gigantic elaboration procedures generated for Ada packages instantiating gazillions of generics. We have actually lowered the threshold locally. Heh, I believe you! :-) IMHO we should add a OPT_Wdisabled_optimization warning there, though. Gr. Steven
Re: compiling very large functions.
On 11/5/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: I would like to point out that the central point of my proposal was to have the compilation manager be the process that manages if an optimization is skipped or not rather than having each pass make a decision on it's own. If we have a central mechanism, then it is relative easy to find some sweet spots. If every pass rolls its own, it is more difficult to balance. Hmm, I don't understand this. Why is it harder to find a sweet spot if every pass decides for itself whether to run or not? I would think that this decision should be made by each pass individually, because the pass manager is one abstraction level higher where it shouldn't have to know the behavior of each pass. Gr. Steven
Re: 16 byte alignment hint for sse vectorization
On Nov 4, 2006, at 11:00 AM, Michael James wrote: Does anyone have a suggestion? #define SSE __attribute__((aligned (16))) typedef float matrix_sub_t[1024] SSE; typedef matrix_sub_t matrix_t[100]; matrix_t a, b, c; void calc(matrix_sub_t * restrict ap, matrix_sub_t * restrict bp, matrix_sub_t * restrict cp) { int i, n = 1024; for (i=0; iwould seem the be one way to do this that should work. Doesn't seem to vectorize for me, though, if I read that right, the alignment issue isn't the problem but rather the data dependancy analysis: t.c:13: note: === vect_analyze_dependences === t.c:13: note: not vectorized: can't determine dependence between (*bp_12)[i_4] and (*ap_16)[i_4] t.c:13: note: bad data dependence.
Re: compiling very large functions.
Steven Bosscher wrote: On 11/5/06, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: I would like to point out that the central point of my proposal was to have the compilation manager be the process that manages if an optimization is skipped or not rather than having each pass make a decision on it's own. If we have a central mechanism, then it is relative easy to find some sweet spots. If every pass rolls its own, it is more difficult to balance. Hmm, I don't understand this. Why is it harder to find a sweet spot if every pass decides for itself whether to run or not? I would think that this decision should be made by each pass individually, because the pass manager is one abstraction level higher where it shouldn't have to know the behavior of each pass. If every expensive pass decides it is too expensive to run, then none of them will run, when what you might have wanted is for some but not all of them to run. Being at a higher level means you can query a pass for a cost estimate given a certain set of arguments, and then decide whether or not to disable the pass based on what you know so far about the entire compilation chain. Also, often passes will interact with each other, i.e. they are not orthogonal, and a good way to manage the effects is at a higher level, as opposed to having individual passes communicating with each other. Chris
Re: Induction variable optimization
Hello, > I am a M.E.Computer science student and doing project on induction variable > optimization. > > Therefore i am reading the file tree-ssa-loop-ivopts.c of gcc-4.0.2 to know > about what have implemented in that. > > Is there any other way to know about what have implemented yet and in > gcc-4.0.2. Can i know the algorithm.? the overview of the algorithm is in the comment at the start of tree-ssa-loop-ivopts.c. Zdenek
Re: compiling very large functions.
On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote: > AFAIK not one of the tree optimizers disables itself, but perhaps we > should. The obvious candidates would be the ones that require > recomputation of alias analysis, and the ones that don't update SSA > info on the fly (i.e. require update_ssa, which is a horrible compile > time hog). Tree alias analysis can partially disable itself though: No, it can't. Tree alias representation can :) it is also not really partially disabling. It's really fully disabling in 99% of /* If the program has too many call-clobbered variables and/or function calls, create .GLOBAL_VAR and use it to model call-clobbering semantics at call sites. This reduces the number of virtual operands considerably, improving compile times at the expense of lost aliasing precision. */ maybe_create_global_var (ai); We have found this to be quite helpful on gigantic elaboration procedures generated for Ada packages instantiating gazillions of generics. We have actually lowered the threshold locally. As i alluded to above, this is disabling representation of accurate call clobbering. It still performs the same analysis, it's just not representing the results the same way. (This is also another one of those things that makes other places pretty hairy as a result) This is in fact, a side-effect of the fact that we currently try to represent aliasing information in terms of "variables things access" instead of just saying "we know these things can touch the same part of the heap". IE we should be making memory equivalence classes and using those as the symbols, instead of variables. Otherwise, we end up saying "these things can touch the same 30 variables" by listing all 30 variables in vops, instead of just creating a single symbol that represents 30 variables and using this. (Finding a good set of equivalence classes to use is, of course, a Hard Problem(TM) , which is why we started by doing it this way).
Re: compiling very large functions.
> > Tree alias analysis can partially disable itself though: > > No, it can't. Tree alias representation can :) I presume you're thinking of the pass that performs the analysis, while I was more thinking of the global machinery; my understanding is that the machinery will not be able to disambiguate memory accesses it would have been able to, if the limit were not reached. > it is also not really partially disabling. It's really fully disabling > in 99% of Partially because it only affects call-clobbered variables IIUC. -- Eric Botcazou
Re: 16 byte alignment hint for sse vectorization
Hello Dorit, Thank you for the list of references. What I gathered from reading this is that alignment attributes applied to the base element of an array are causing problems for other legitimate uses of this attribute. It is basically stupid to specify the base type of an array be aligned because this implies that the array elements are scattered in memory. The only examples I can think of where you might want to do that is with multidimensional arrays where you want the start of each subarray to be aligned. That is better solved just by changing the length of the inner array to one that yields the desired alignment. However, specifying pointer alignment (whether this hint is passed through the type system or some other way) is an important piece of information that a programmer may wish to communicate to the optimizer. The obvious semantics for the hint are that the log2(alignment) least significant bits of the pointer value are clear. The size of the pointer base type would be unaffected so that pointer math can still yield unaligned access. I hope I have not oversimplified the issue at hand here. From the point of view of someone passively benefiting from the optimizer, this issue does not matter so much. However, someone who is performance tuning his code may care very much about getting this information through to the vectorizor and without some provision for a hint he is stuck either (1) without the optimization or (2) rewriting the code so that all aligned arrays are passed as pointer-to-struct with the array element in the struct specified with an alignment attribute. I have not tested method 2; it seems like a transformation which may work despite being unaesthetic. Regards, Michael James On 11/5/06, Dorit Nuzman <[EMAIL PROTECTED]> wrote: Unfortunately there's no way to specify alignment attribute of pointers in GCC - the syntax was allowed in the past but not really supported correctly, and then entirely disallowed (by this patch http://gcc.gnu.org/ml/gcc-patches/2005-04/msg02284.html). This issue was discussed in details in these threads: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20794 http://gcc.gnu.org/ml/gcc/2005-03/msg00483.html (and recently came up again also in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827#c56). The problem is that "We don't yet implement either attributes on array parameters applying to the array not the pointer, or attributes inside the [] of the array parameter applying to the pointer. (This is documented in "Attribute Syntax".)" (from the above thread). dorit > Hello, > > I have been playing with gcc's new (to me) auto vectorization > optimizations. I have a particular loop for which I have made external > provisions to ensure that the data is 16-byte aligned. I have tried > everything I can think of to give gcc the hint that it is operating on > aligned data, but still the vectorizer warns that it is operating on > unaligned data and generates the less efficient MOVLPS/MOVUPS instead > of MOVAPS. > > The code is like this: > > #define SSE __attribute__((aligned (16))) > > typedef float matrix_t[100][1024]; > > matrix_t aa SSE, bb SSE, cc SSE; > > void calc(float *a, float *b, float *c) { > int i, n = 1024; > > for (i=0; i a[i] = b[i] / (b[i] + c[i]); > } > > } > > int main(void) { > int i, n = 100; > for (i=0; i calc(a[i], b[i], c[i]); > } > } > > gcc rejects if I specify alignment attributes on the formal parameters > (obviously it was dumb to even try that), and there does not seem to > be a way to get the alignment hint to apply to the object referenced > by the pointer instead of the pointer itself. > > In my application it is important that the function doing the > computations remains abstracted away from the data definitions, as > there is over 1G of data dynamically arranged and the actual alignment > provisions are made with mmap(). > > Does anyone have a suggestion? > > Regards, > Michael James
Re: compiling very large functions.
On 11/5/06, Eric Botcazou <[EMAIL PROTECTED]> wrote: > > Tree alias analysis can partially disable itself though: > > No, it can't. Tree alias representation can :) I presume you're thinking of the pass that performs the analysis, while I was more thinking of the global machinery; my understanding is that the machinery will not be able to disambiguate memory accesses it would have been able to, if the limit were not reached. Depends on what you mean by "the machinery". There is no standard API to using the results of the analysis without using the representation we provide, but some passes do it anyway through hackery, so yes and no :) > it is also not really partially disabling. It's really fully disabling > in 99% of Partially because it only affects call-clobbered variables IIUC. It affects all variables that escape or are non-local, which is roughly all variables that have interesting aliasing properties (IE those that cannot be ascertained trivially). Anyway, I would certainly not hold up what we do in alias representation as a good example of proper throttling in the case of large functions.
differences between dg-do compile and dg-do assemble
Dear all, Although I understand what is the difference between dg-do compile and dg-do assemble, I have noticed that there are many testcases that use either dg-compile or dg-do assemble and do nothing with the output. Thus, I would like to know: Is it faster {dg-do compile} or {dg-do assemble} ? Is it appropriate to always use the faster one if the testcase just checks for the presence/absence of warnings and errors? Cheers, Manuel.
Re: differences between dg-do compile and dg-do assemble
> Although I understand what is the difference between dg-do compile and > dg-do assemble, I have noticed that there are many testcases that use > either dg-compile or dg-do assemble and do nothing with the output. > Thus, I would like to know: > > Is it faster {dg-do compile} or {dg-do assemble} ? It's notionally faster to use `compile' (which will produce a .s file, rather than assembling to an object file). However, I benchmarked a change to the test harness last year that skipped assembling test cases when it was not necessary and found that it made such a tiny improvement that it wasn't worth worrying about. > Is it appropriate to always use the faster one if the testcase just > checks for the presence/absence of warnings and errors? The other side of the argument is that by using dg-do assemble, you'll potentially find a class of compiler bugs that would go unnoticed if you only compiled. Cheers, Ben
Re: compiling very large functions
The gcc developers have been very cooperative over the years in working to solve problems that I've had in compiling large machine- generated programs. For example, when gcse was disabled for large flow flow graphs in 2000, the warn_disabled_optimization flag was added at my suggestion. As Steven Bosscher hinted at, however, it appears that the disabled optimization warning has not been used at all beyond its introduction: % grep -R disabled_optimization * gcc/.svn/text-base/ChangeLog-2000.svn-base: * toplev.c (warn_disabled_optimization): Declare new warning flag. gcc/.svn/text-base/ChangeLog-2000.svn-base: * flags.h (warn_disabled_optimization): Add it here. gcc/.svn/text-base/common.opt.svn-base:Common Var (warn_disabled_optimization) gcc/.svn/text-base/gcse.c.svn-base: warning (OPT_Wdisabled_optimization, gcc/.svn/text-base/gcse.c.svn-base: warning (OPT_Wdisabled_optimization, gcc/ChangeLog-2000: * toplev.c (warn_disabled_optimization): Declare new warning flag. gcc/ChangeLog-2000: * flags.h (warn_disabled_optimization): Add it here. gcc/common.opt:Common Var(warn_disabled_optimization) gcc/gcse.c: warning (OPT_Wdisabled_optimization, gcc/gcse.c: warning (OPT_Wdisabled_optimization, A grep of 'PARAM_VALUE.*MAX' may give a more accurate idea of where optimization passes have been throttled for various reasons. In reporting runtime problems, my experience has been that if there is one specific pass whose runtime overwhelms the runtime of all the other passes, then people respond quickly to deal with it. I've tended to use a relatively low level of optimization in my programs (- O1 -fschedule-insns2 -fno-math-errno -fno-trapping-math -fwrapv -fno- strict-aliasing) and even as new tree-ssa passes were added to -O1, the runtime and space requirements were kept reasonable (perhaps after a bit of work). Now I'm finding that more aggressive optimizations can significantly speed up these codes; I'm also finding that these newly-attempted optimization passes take up so much memory (especially on darwin G5 for some reason) that they can't be applied except on very large machines. (One of my codes took only six minutes, but over 10.1 GB of memory, to compile; smaller examples are given in PR 29374.) Perhaps these passes intrinsically require large amounts of memory, or perhaps the algorithms and data structures used have not yet been examined critically for very large programs. When I find time I plan to discover which specific optimizations require large memory; perhaps with this data the passes involved can be studied more closely to see whether the memory requirements can be cut. With that background, I'd like to request that optimization passes not be throttled back by default, especially in stage 1 of a release cycle. I fear that this would unduly hide problems that might be solved with a reasonable amount of effort. It's not clear to me how many new optimization passes and data structures have been stress- tested on very large programs; we may find that most problems that appear now can be fixed if they are examined in isolation with the right input data. Brad
Where is the splitting of MIPS %hi and %lo relocations handled?
I am going to try to fix: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29721 Which is a problem where a %lo relocation gets separated from its corresponding %hi. What is the mechanism that tries to prevent this from happening? And where is it implemented? Thanks, David Daney
Abt long long support
Hello all, Looking at a .md file of a backend it there a way to know whether a target supports long long Should i look for patterns with machine mode DI? Is there some other way? Thanks in advance for the help. Regards, Shafi