collaborative tuning of GCC optimization heuristic
Dear colleagues, If it's of interest, we have released a new version of our open-source framework to share compiler optimization knowledge across diverse workloads and hardware. We would like to thank all the volunteers who ran this framework and shared some results for GCC 4.9 .. 6.0 in the public repository here: http://cTuning.org/crowdtuning-results-gcc Here is a brief note how this framework for crowdtuning compiler optimization heuristics works (for more details, please see https://github.com/ctuning/ck/wiki/Crowdsource_Experiments): you just install a small Android app (https://play.google.com/store/apps/details?id=openscience.crowdsource.experiments) or python-based Collective Knowledge framework (http://github.com/ctuning/ck). This program sends system properties to a public server. The server compiles a random shared workload using some flag combinations that have been found to work well on similar machines, as well as some new random ones. The client executes the compiled workload several times to account for variability etc, and sends the results back to the server. If a combination of compiler flags is found that improves performance over the combinations found so far, it gets reduced (by removing flags that do now affect the performance) and uploaded to a public repository. Importantly, if a combination significantly degrades performance for a particular workload, it gets recorded as well. This potentially points to a problem with optimization heuristics for a particular target, which may be worth investigating and improving. At the moment, only global GCC compiler flags are exposed for collaborative optimization. Longer term, it can be useful to cover finer-grain transformation decisions (vectorization, unrolling, etc) via plugin interface. Please, note that this is a prototype framework and much more can be done! Please get in touch if you are interested to know more or contribute! Take care, Grigori = Grigori Fursin, CTO, dividiti, UK
Re: collaborative tuning of GCC optimization heuristic
On Sat, Mar 5, 2016 at 9:13 AM, Grigori Fursin wrote: > Dear colleagues, > > If it's of interest, we have released a new version of our open-source > framework to share compiler optimization knowledge across diverse workloads > and hardware. We would like to thank all the volunteers who ran this > framework and shared some results for GCC 4.9 .. 6.0 in the public > repository here: http://cTuning.org/crowdtuning-results-gcc > > Here is a brief note how this framework for crowdtuning compiler > optimization heuristics works (for more details, please see > https://github.com/ctuning/ck/wiki/Crowdsource_Experiments): you just > install a small Android app > (https://play.google.com/store/apps/details?id=openscience.crowdsource.experiments) > or python-based Collective Knowledge framework > (http://github.com/ctuning/ck). This program sends system properties to a > public server. The server compiles a random shared workload using some flag > combinations that have been found to work well on similar machines, as well > as some new random ones. The client executes the compiled workload several > times to account for variability etc, and sends the results back to the > server. > > If a combination of compiler flags is found that improves performance over > the combinations found so far, it gets reduced (by removing flags that do > now affect the performance) and uploaded to a public repository. > Importantly, if a combination significantly degrades performance for a > particular workload, it gets recorded as well. This potentially points to a > problem with optimization heuristics for a particular target, which may be > worth investigating and improving. > > At the moment, only global GCC compiler flags are exposed for collaborative > optimization. Longer term, it can be useful to cover finer-grain > transformation decisions (vectorization, unrolling, etc) via plugin > interface. Please, note that this is a prototype framework and much more can > be done! Please get in touch if you are interested to know more or > contribute! Thanks for creating and sharing this interesting framework. I think a central issue is the "random shared workload" because the optimal optimizations and optimization pipeline are application-dependent. The proposed changes to the heuristics may benefit for the particular set of workloads that the framework tests but why are those workloads and particular implementations of the workloads representative for applications of interest to end users of GCC? GCC is turned for an arbitrary set of workloads, but why are the workloads from cTuning any better? Thanks, David
Re: Implementing TI mode (128-bit) and the 2nd pipeline for the MIPS R5900
On 02/27/2016 01:38 AM, Woon yung Liu wrote: I've given up on trying to implement MMI support for this target because I couldn't get the larger-than-normal GPR sizes to work nicely with the GCC internals (registers sometimes get split due to the defined word size, or the stuff in expr.c will just suffer from assertion failures). [ Apologies for assumptions being made here, since I can't find an instruction set reference for the r5900 anymore. ] You probably don't want to be using TImode for MMI support anyway, since, in the broader context this instruction set extension is about SIMD. Thus e.g. V16QImode and V8HImode might be more appropriate. It seems like the RTL patterns are not unique according to their names, but the inputs/outputs. Correct. Is there a way to force GCC to use a specific pattern (i.e. "r5900_qword_store" and "r5900_qword_load")? I don't want to add the lq/sq instructions to mips_output_move because it will allow lq/sq to be used for stuff that isn't supported (i.e. loading TI-mode data types into a register for arithmetic operations that don't exist). You can't. For a given set of inputs, one must provide all of the valid ways that one can perform the operation as alternatives. Thus for TImode move, currently defined as (define_insn "*movti" [(set (match_operand:TI 0 "nonimmediate_operand" "=d,d,d,m,*a,*a,*d") (match_operand:TI 1 "move_operand" "d,i,m,dJ,*J,*d,*a"))] one would have to add additional alternatives for the lq and sq instructions (and probably a register-register alternative as well, e.g. por d,s,s). You must use the set_attr section to describe when the alternatives that you add are valid. The "enabled" attribute controls this. Looking at the mips port, it would appear that adding to "move_type" would be best. It is of course simpler if the patterns that you want to add do not overlap with existing patterns. Thus if you stick to the vector modes you have less overlap than if you describe MMI as using TImode. r~