Hello, Described here is the future plan for automatic parallelization in GCC.
The current autopar pass is based on GOMP infrastructure; it distributes iterations of loops to several threads (the number is instructed by the user) if it was determined that they are independent. The only dependency allowed to exist is reduction, which is handled as a special case. This pass was initially contributed to GCC4.3 by Zdenek Dvorak and Sebastian Pop. With the integration of Graphite (http://gcc.gnu.org/wiki/Graphite) to GCC4.4, a strong loop nest analysis and transformation engine was introduced, and the notion of using the polyhedral model to expose loop parallelism in GCC becomes feasible and relevant. Our prospective goals are to incrementally integrate autopar and Graphite. As in auto par, we'll initially focus on synchronization free parallelization. The first step, as we see it, will teach Graphite that parallel code needs to be produced. This means that Graphite will recognize simple parallel loops (using SCoP detection and data dependency analysis), and pass on that information. The information that needs to be conveyed expresses that a loop is parallelizable, and may also include annotations of more detailed information e.g, the shared/private variables. There are two possible models for the code generation: 1. Graphite will annotate parallel loops and pass that information all the way through CLOOG to the current autopar code generator to produce the parallel, GOMP based code. 2. Graphite will annotate the parallel loops and CLOOG itself will be responsible of generating the parallel code. A point to notice here is that scalars/reductions are currently not handled in Graphite. In the first model, where Graphite calls autopar's code generation, scalars can be handled. After Graphite finishes its analysis, it calls autopar's reduction analysis, and only then the code generation is called (if the scalar analysis determines that the loop still parallelizable, of course). Once the first step is accomplished, the following steps will focus on teaching Graphite to find loop transformations (such as skewing, interchange etc.) that expose coarse grain synchronization free parallelism. This will be heavily based on the polyhedral data dependence and transformation infrastructures. We have not determined which algorithm/ techniques we're going to use for this part. Having synchronization free parallelization integrated in Graphite, will set the ground for handling parallelism requiring a small amount of parallelization. This is a rough view for our planned work on autopar in GCC. Please feel free to ask/comment. Thanks, Razya