Hi Yuanjie, Liang, et al,

This email is about further GSoC'09 developments for plugins, generic function 
cloning, fine-grain optimizations
and program instrumentation this summer. Considering that the basic 
infrastructure is now available I would like
to agree on further developments based on the feedback I got during last 3 
weeks so that we could extend 
the projects quickly. Though this email primarily concerns Yuanjie and Liang, I 
am sending this email to all the colleagues involved in the project or who has 
been interested at some point as well as GCC and cTuning mailing lists just to 
make everyone aware of the developments. This is a long email so if you are not 
interested in these projects, please skip it ... 

1) Originally we thought to use stable GCC 4.4.0 with plugin/ICI support for 
GSoC (already prepared), 
however considering that GCC 4.5 will have plugin support and extended function 
cloning capabilities, 
we should eventually move all the developments to the trunk. Zbigniew mentioned 
that he will synchronize 
ICI with  the current trunk fully within 2 weeks, so we can start working on 
GCC 4.4.0 (with plugins and ICI)
until then (plugins shouldn't change much but some gluing with new GCC will be 
required) and then sync 
with GCC 4.5 + synced ICI.

2) We need to prepare a plugin that uses XML library (libxml2 for example - 
http://xmlsoft.org) 
and records basic information about compilation flow. I suggest that we record
the following info per function for now (we can use filename 
gcc_compilation_flow.<function name>.xml for example):
* GCC version
* Plugin version which has been used to record info
* File name
* function name
** Within inter-procedural stage, we can call a function name #IP#
and besides IP passes also provide some global info such as which optimization 
flags/parameters has been used
* function start line (source) and end line and other currently available ICI 
features from http://ctuning.org/wiki/index.php/CTools:ICI:List_of_features
* function specific optimizations or code generation flags (if applicable - I 
think
Mike Meissner's patch that enables function-specific flags has been included in 
GCC 4.5)
* passes
** available fine-grain optimization within passes 

Except fine-grain optimizations, all the information should be already 
available 
and there are 2 ICI plugin (test1, test2) that show how to get this info...

We should record this info per function to avoid large files for large projects 
since often 
we may want to control only a few functions. This can help with memory and cpu 
utilization
when using libxml ...

We should be able to control which functions to process using either a command 
line argument
with a list of functions or an environment variable (which we can later convert 
into command
line argument). If it's empty, all the functions are processed.

3) When we want to perform function cloning or use fine-grain 
optimization/instrumentation,
we can use the same XML files created during the record stage (or prepare them 
manually/automatically
using external tools) and add additional fields.
We will need to perform function cloning using a new IP pass (as described in 
the GCC Summit
presentation by Honza Hubicka). 

We can provide info about which functions to clone in XML file for a IP cloning 
pass,
i.e. something like:
<pass>generic_cloning
 <external_libraries_for_adaptation>libadapt, other libraries if needed such as 
hardware counters monitoring
(if needed)
<function>foo
  <clones>2
  <clone_name_extension>_clone
  <adaptation_function>gcc_adapt (this function will be called before the clone 
and will select which
clone to use based on either machine description or monitoring of hardware 
counters or dataset features
to enable online dynamic optimization for statically compiled programs, etc)
 <function>boo
  <clones>3   
  <clone_name_extension>_clone
  <adaptation_function>

Basically, when we create clones, we need to make the following substitution 
for a code:

foo{
/* before cloning */
...
}

foo{
/* after cloning */

switch (gcc_adapt(function_number)) {
case 1: foo_clone1(..); break;
case 2: foo_clone2(..); break;
default:
/* original code */
...
}

Basically, when the generic_cloning pass is invoked, it will be communicating 
with
a plugin asking for all the necessary information to clone 1 function. The 
plugin
will send an "End" instruction when all the functions are processed so that 
compilation
could continue ...

We need to decide how to number functions (so that the selection is fast) 
and how to aggregate this info is we compile projects with multiple functions.
Also, we can have a mode when we skip the function number in case we adapt
for different architecture and compile clones with different -msse2, -msse3 
flags ...

After cloning is done, all the cloned functions should appear in recorded XML 
file.
We can then optimize those clones using different flags or different passes
or changing fine-grain optimizations. 

We can use OProfile or gprof to monitor performance however we may also need 
instrumentation
capabilities to add calls to external time/hardware counters monitoring routines
before and after a call to a function. We can provide this info within XML as 
well,
i.e.
<function>foo
<add_function_call_before_func>_timer1
<add_function_call_after_func>_timer2

As for fine-grain optimizations, I suggest that we start from unrolling, 
vectorization
and blocking since those optimizations do not have good heuristics yet so we 
can try
to help tuning them automatically. Again, after an associated pass we should 
record
info about a loop(nest) where this optimization happened and which parameter 
has been
used - we will start adding more info about features preceding the optimization 
decision:
<function>foo
 <pass>unroll
  <loop>1
  <unroll_factor>4
  <loop>2
  <unroll_factor>1
<function>boo
 <pass>graphite (?)
  <loop>1
  <blocking_factor>64
...
etc.

I don't know if it's clear or not and I will be happy to elaborate more so 
comments are welcome!
Yuanjie and Liang, we can discuss further developments tomorrow during a 
conf-call ...

We need to prepare the first prototypes reasonably quickly so that we could see 
if there are potential problems and how to solve them ... I will be helping 
with 
testing and evaluation...

Cheers,
Grigori


Reply via email to