Re: Automatic Parallelization & Graphite - future plans

Cupertino Miranda Thu, 19 Mar 2009 08:23:42 -0700

Hello everyone,

In attach I included the patch Albert Cohen was referring to.

Middle-end selection is performed by marking the regions of the source code, that should be compiled for an specific ISA, using pragmas such as:

#pragma target <target_name>
Or even to reset the above by just doing:
#pragma target
Which enables code again to be compiled to all the ISAs.

A new parameter to cc1 (-ftarget=<target_name>) was created to specify which regions should be compiled.

For the propose of compiling for Cell, we created a script that would be making the rule of driver, using the 2 compiled versions of cc1 (PPU/SPU) and linkind everything together with IBM tools and PPU gcc driver.

Apart from the pragmas there are a few hacks in the patch that enable the output to be usable. Please realize it was done as prof of concept and so it is still unstable and outdated for current trunk. ;-)


Regards,
Cupertino Miranda

single_source_gcc.patch
Description: Binary data




On Mar 18, 2009, at 11:56 PM, Albert Cohen wrote:

Steven Bosscher wrote:
On Wed, Mar 18, 2009 at 8:17 PM, Albert Cohen <albert.co...@inria.fr> wrote:
Antoniu Pop wrote:
(...)
The multiple backends compilation is not directly related, so you
should use a separate branch. It makes sense to go in that direction.
Indeed.
Work has been going on for years in this direction, but it has never
gone very far.
There has been some work in the area, using different approaches. I've been
involved in one attempt, for the Cell, with Cupertino Miranda in CC.
Cupertino: could the URL where to find documentation on your experiments,
and the (old) patch to GCC and the (old) Cell SDK for that purpose?
What approach was taken in these experiments?
Cupertino will send you the documentation and reference to the old patch sent on the gcc-patches list.
In brief, there is no hotswapping, just attributes to let the MIDDLE- END decide, right before expanding to RTL, which part of the trees to keep and which to drop. Selection is done at the function level, and at the variable declaration level.
You still need multiple runs of cc1, but they could be hidden behind a single run of the driver. It may not be a good idea, though, since different optimization flags may be relevant for the different backends (and even for the different functions, but this is a distinct issue).
The point, for the Cell, was to perform whole-program analysis across ISA boundaries. E.g., looking at IPCP or specialization and inlining. Another example is to be able to assess the profitability of a transformation on a function that compiles to target X, but internally depends (calls) a function on target Y. You would have to assess the side-effects, cost, and pass static and dynamic profile data across the boundaries again. This was only a target direction, we did not do anything there.
We struggled a lot with the data types and API that differ widely and non-consistently between the Cell PPE and SPE... as if IBM did not think until very late that single-source-multiple-backend compilation was relevant!
Albert

Re: Automatic Parallelization & Graphite - future plans

Reply via email to