Hi Yuanjie, Liang, et al, This email is about further GSoC'09 developments for plugins, generic function cloning, fine-grain optimizations and program instrumentation this summer. Considering that the basic infrastructure is now available I would like to agree on further developments based on the feedback I got during last 3 weeks so that we could extend the projects quickly. Though this email primarily concerns Yuanjie and Liang, I am sending this email to all the colleagues involved in the project or who has been interested at some point as well as GCC and cTuning mailing lists just to make everyone aware of the developments. This is a long email so if you are not interested in these projects, please skip it ...
1) Originally we thought to use stable GCC 4.4.0 with plugin/ICI support for GSoC (already prepared), however considering that GCC 4.5 will have plugin support and extended function cloning capabilities, we should eventually move all the developments to the trunk. Zbigniew mentioned that he will synchronize ICI with the current trunk fully within 2 weeks, so we can start working on GCC 4.4.0 (with plugins and ICI) until then (plugins shouldn't change much but some gluing with new GCC will be required) and then sync with GCC 4.5 + synced ICI. 2) We need to prepare a plugin that uses XML library (libxml2 for example - http://xmlsoft.org) and records basic information about compilation flow. I suggest that we record the following info per function for now (we can use filename gcc_compilation_flow.<function name>.xml for example): * GCC version * Plugin version which has been used to record info * File name * function name ** Within inter-procedural stage, we can call a function name #IP# and besides IP passes also provide some global info such as which optimization flags/parameters has been used * function start line (source) and end line and other currently available ICI features from http://ctuning.org/wiki/index.php/CTools:ICI:List_of_features * function specific optimizations or code generation flags (if applicable - I think Mike Meissner's patch that enables function-specific flags has been included in GCC 4.5) * passes ** available fine-grain optimization within passes Except fine-grain optimizations, all the information should be already available and there are 2 ICI plugin (test1, test2) that show how to get this info... We should record this info per function to avoid large files for large projects since often we may want to control only a few functions. This can help with memory and cpu utilization when using libxml ... We should be able to control which functions to process using either a command line argument with a list of functions or an environment variable (which we can later convert into command line argument). If it's empty, all the functions are processed. 3) When we want to perform function cloning or use fine-grain optimization/instrumentation, we can use the same XML files created during the record stage (or prepare them manually/automatically using external tools) and add additional fields. We will need to perform function cloning using a new IP pass (as described in the GCC Summit presentation by Honza Hubicka). We can provide info about which functions to clone in XML file for a IP cloning pass, i.e. something like: <pass>generic_cloning <external_libraries_for_adaptation>libadapt, other libraries if needed such as hardware counters monitoring (if needed) <function>foo <clones>2 <clone_name_extension>_clone <adaptation_function>gcc_adapt (this function will be called before the clone and will select which clone to use based on either machine description or monitoring of hardware counters or dataset features to enable online dynamic optimization for statically compiled programs, etc) <function>boo <clones>3 <clone_name_extension>_clone <adaptation_function> Basically, when we create clones, we need to make the following substitution for a code: foo{ /* before cloning */ ... } foo{ /* after cloning */ switch (gcc_adapt(function_number)) { case 1: foo_clone1(..); break; case 2: foo_clone2(..); break; default: /* original code */ ... } Basically, when the generic_cloning pass is invoked, it will be communicating with a plugin asking for all the necessary information to clone 1 function. The plugin will send an "End" instruction when all the functions are processed so that compilation could continue ... We need to decide how to number functions (so that the selection is fast) and how to aggregate this info is we compile projects with multiple functions. Also, we can have a mode when we skip the function number in case we adapt for different architecture and compile clones with different -msse2, -msse3 flags ... After cloning is done, all the cloned functions should appear in recorded XML file. We can then optimize those clones using different flags or different passes or changing fine-grain optimizations. We can use OProfile or gprof to monitor performance however we may also need instrumentation capabilities to add calls to external time/hardware counters monitoring routines before and after a call to a function. We can provide this info within XML as well, i.e. <function>foo <add_function_call_before_func>_timer1 <add_function_call_after_func>_timer2 As for fine-grain optimizations, I suggest that we start from unrolling, vectorization and blocking since those optimizations do not have good heuristics yet so we can try to help tuning them automatically. Again, after an associated pass we should record info about a loop(nest) where this optimization happened and which parameter has been used - we will start adding more info about features preceding the optimization decision: <function>foo <pass>unroll <loop>1 <unroll_factor>4 <loop>2 <unroll_factor>1 <function>boo <pass>graphite (?) <loop>1 <blocking_factor>64 ... etc. I don't know if it's clear or not and I will be happy to elaborate more so comments are welcome! Yuanjie and Liang, we can discuss further developments tomorrow during a conf-call ... We need to prepare the first prototypes reasonably quickly so that we could see if there are potential problems and how to solve them ... I will be helping with testing and evaluation... Cheers, Grigori