http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60661
Bug ID: 60661
Summary: DO CONCURRENT with MASK: Avoid using a temporary for
the mask
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: burnus at gcc dot gnu.org
CC: tkoenig at gcc dot gnu.org
Currently, gfortran generates a temporary as shown below. However, the question
is whether one cannot do without a temporary by moving the mask expression into
the loop.
I think that usually works - but not always. It works when:
a) The variable in the mask does not occur on the LHS of an assignment or as
intent([in]out) argument of a pure subroutine
b) If the variable only occurs with the same array index as later in the body
of the DO CONCURRENT loop
I am not sure whether something with FORALL prevents this optimization.
I think the simplest fix would be to transform
DO CONCURRENT(i=1:n, mask(i))
...
to
DO CONCURRENT(i=1:n)
IF (.not. mask(i)) CYCLE
in the FE optimization
"7.2.4.2.3 Evaluation of the mask expression
The scalar-mask-expr, if any, is evaluated for each combination of index-name
values. If there is no scalar-mask-expr, it is as if it appeared with the value
true. The index-name variables may be primaries in the
scalar-mask-expr. The set of active combinations of index-name values is the
subset of all possible combinations (7.2.4.2.2) for which
the scalar-mask-expr has the value true."
C736 (R752) The scalar-mask-expr shall be scalar and of type logical.
C737 (R752) Any procedure referenced in the scalar-mask-expr , including one
referenced by a defined operation,
shall be a pure procedure (12.7).
forall (i=start:end:stride; maskexpr)
e<i> = f<i>
g<i> = h<i>
end forall
(where e,f,g,h<i> are arbitrary expressions possibly involving i)
Translates to:
count = ((end + 1 - start) / stride)
masktmp(:) = maskexpr(:)
maskindex = 0;
for (i = start; i <= end; i += stride)
{
if (masktmp[maskindex++])
e<i> = f<i>
}
maskindex = 0;
for (i = start; i <= end; i += stride)
{
if (masktmp[maskindex++])
g<i> = h<i>
}