I have started working on connecting Dmitry's OpenMP parser to
the middle-end so that we can start generating the basic runtime
calls, which Richard should be posting soon. With any luck, we
should have some basic functionality in a few weeks.
Initially, we will be outlining parallel sections into their own
functions. This is mostly for implementation convenience.
However, long term we are better off incorporating parallel
markers into the IL so that we can do a better job analyzing and
optimizing.
It may be marginally quicker to be able to launch threads that
execute the same body of code because it avoids the argument
passing overhead for shared stuff and the memory indirection in
the launched functions. But mostly, I'm interested in the IL
elements for optimization and analysis. Launching multiple
threads on the same function body may give us more headaches than
it's worth ATM.
Essentially, we will have an IL expression for every OpenMP
pragma. These expressions are GENERIC and the gimplifier work is
mostly in the bodies. With few exceptions, the controlling
predicates and clauses are required to be in more or less GIMPLE
form by the standard already.
The lowering will, for now, just create a new function and
replace the block of code along the lines of tree-nested.c.
However, in the future, the parallel sections will be
single-entry single-exit regions in the CFG with the controlling
GOMP_PARALLEL_... expression as the entry block and a latch-like
exit block. The parallel region building can be modeled after
the loop structure, but there isn't as much nesting, so it
shouldn't be too complex. As an aside, we do need CFG region
building and the ability to have the optimizers work on
sub-regions (currently being worked on, as I understand).
In fact, even if we don't end up launching threads on the same
function body, we can keep the parallel region inside the
function throughout the optimizers and outline it at a later
point (before RTL, perhaps).
Some runtime library calls (synchronization mostly), ought to be
recognizable as such by the optimizers. I am not sure whether to
define them as builtins, provide an attribute or make them IL
expressions. Any suggestions/ideas?
The IL constructs mostly mirror their #pragma counterparts. Take
these as a design draft, I have only started working on the
implementation, so I expect the design to evolve as I implement
things. There may also be several hidden assumptions that I
expect to become embarrassingly obvious in a few weeks. Names
prefixed with "g_" below mean "the gimplified form of ...".
Parallel regions
----------------
#pragma omp parallel [clause1 ... clauseN]
------------------------------------------
GENERIC
GOMP_PARALLEL <parallel_clauses data_clauses, body>
GIMPLE
GOMP_PARALLEL <g_parallel_clauses g_data_clauses, L1, L2>
L1:
g_body
L2:
#pragma omp for [clause1 ... clauseN]
-------------------------------------
GENERIC
GOMP_FOR <for_clauses data_clauses nowait_clause, init-expr, incr-expr,
body>
GIMPLE
GOMP_FOR <g_for_clauses g_data_clauses nowait_clause, init-expr,
incr-expr, L1, L2>
L1:
g_body
L2:
Both INIT-EXPR and INCR-EXPR are required to be in GIMPLE
form by the standard already, so there's little that need
to be done there. Keeping them in the header itself
makes it easy to reference later when we're generating
code.
#pragma omp sections [clause1 ... clauseN]
------------------------------------------
GENERIC
GOMP_SECTIONS <data_clauses nowait_clause, body>
GIMPLE
GOMP_SECTIONS <g_data_clauses nowait_clause, L1, L2>
L1:
g_body
L2:
#pragma omp section
-------------------
GENERIC
GOMP_SECTION <body>
GIMPLE
GOMP_SECTION <L1, L2>
L1:
g_body
L2:
#pragma omp single [clause1 ... clauseN]
----------------------------------------
GENERIC
GOMP_SINGLE <data_clauses nowait_clause, body>
GIMPLE
GOMP_SINGLE <g_data_clauses nowait_clause, L1, L2>
L1:
g_body
L2:
#pragma omp master
------------------
GENERIC
GOMP_MASTER <body>
GIMPLE
GOMP_MASTER <L1, L2>
L1:
g_body
L2:
#pragma omp critical [name]
---------------------------
GENERIC
GOMP_CRITICAL <name, block>
GIMPLE
GOMP_CRITICAL <name, L1, L2>
L1:
g_body
L2:
Here, NAME is something the runtime needs to recognize. It will
essentially be the name of the lock to use when emitting the
appropriate lock call.
#pragma omp barrier
-------------------
GENERIC
GIMPLE
GOMP_BARRIER
#pragma omp atomic
-------------------
GENERIC
GIMPLE
GOMP_ATOMIC <expression-statement>
The standard is sufficiently strict that we don't need additional
gimplification here. EXPRESSION-STATEMENT can only be of the form
'VAR binop= EXPR', where EXPR must be of scalar type. ATM, it's not
absolutely clear to me if EXPR needs to be a GIMPLE RHS already or
if it could be more complex. It certainly can't reference VAR.
#pragma omp flush (var-list)
----------------------------
GENERIC
GIMPLE
GOMP_FLUSH <var-list>
#pragma omp ordered
-------------------
GENERIC
GOMP_ORDERED <body>
GIMPLE
GOMP_ORDERED <L1, L2>
L1:
g_body
L2:
#pragma omp threadprivate
-------------------------
This will just set an attribute in each affected _DECL.
Accessible with GOMP_THREADPRIVATE.
for_clauses
-----------
* CLAUSE ordered
GENERIC A boolean field in GOMP_FOR. Accessible with
GOMP_ORDERED.
GIMPLE Same.
* CLAUSE schedule (kind, expr)
GENERIC A structure inside GOMP_FOR. Accessible with
GOMP_SCHEDULE:
enum schedule_kind {
GOMP_SCHED_STATIC,
GOMP_SCHED_DYNAMIC,
GOMP_SCHED_GUIDED,
GOMP_SCHED_RUNTIME } kind;
tree expr;
GIMPLE Same, with EXPR in GIMPLE form as per FE rules.
If missing, it defaults to INTEGER_ONE_NODE for
GOMP_SCHED_DYNAMIC and GOMP_SCHED_GUIDED. It
defaults to iteration-space / num-threads for
GOMP_SCHED_STATIC and it emits getenv reads from
environment for GOM_SCHED_RUNTIME.
nowait_clause
-------------
* CLAUSE nowait
GENERIC A boolean field in GOMP_FOR. Accessible with
GOMP_NOWAIT.
GIMPLE Same.
parallel_clauses
----------------
* CLAUSE if (expr)
GENERIC GOMP_IF <expr>
GIMPLE if (g_expr) goto L1; else goto L2;
L1:
GOMP_PARALLEL <g_parallel_clauses, L2, L3>
L2:
g_body
L3:
* CLAUSE num_threads (expr)
GENERIC A tree field in the GOMP_PARALLEL expression
accessed with GOMP_NUM_THREADS.
GIMPLE Same, with EXPR gimplified as per FE rules.
data_clauses
------------
* CLAUSE private (variable_list)
copyprivate (variable_list)
firstprivate (variable_list)
lastprivate (variable_list)
shared (variable_list)
copyin (variable_list)
GENERIC These are fields in the GOMP_PARALLEL expression.
Accessed with:
GOMP_PRIVATE
GOMP_FIRSTPRIVATE
GOMP_SHARED
GOMP_COPYIN
GIMPLE Same, with variable_list gimplified as per FE
rules.
* CLAUSE default (shared | none)
GENERIC This is a boolean field in the GOMP_PARALLEL
expression.
GIMPLE Same.
* CLAUSE reduction (operator : variable_list)
GENERIC A structure inside GOMP_PARALLEL with two fields
enum tree_code operator -> PLUS_EXPR,
MULT_EXPR,
MINUS_EXPR,
BIT_AND_EXPR,
BIT_XOR_EXPR,
BIT_IOR_EXPR,
AND_EXPR,
OR_EXPR
tree variable_list
GIMPLE Same, with variable_list gimplified as per FE
rules.
Diego.