On 15/7/14 3:00 PM, Jakub Jelinek wrote:
> On Tue, Jul 14, 2015 at 01:46:04PM +0800, Chung-Lin Tang wrote:
>> this patch provides a 'bool independent' field in struct loop, which
>> will be switched on by an "independent" clause in a #pragma acc loop 
>> directive.
>> I assume you'll be wiring it to the kernels parloops pass in a followup 
>> patch.
>>
>> Note: there are already a few other similar fields in struct loop, namely
>> 'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE 
>> respectively.
>> The intention and/or setting of these fields are all a bit different, so I've
>> decided to add a new bool for OpenACC.
> 
> How is it different though?  Can you cite exact definition of the
> independent clause vs. safelen (set to INT_MAX)?
> The OpenMP definition is:
> "A SIMD loop has logical iterations numbered 0,1,...,N-1 where N is the
> number of loop iterations, and the logical numbering denotes the sequence in 
> which the iterations would
> be executed if the associated loop(s) were executed with no SIMD 
> instructions. If the safelen
> clause is used then no two iterations executed concurrently with SIMD 
> instructions can have a
> greater distance in the logical iteration space than its value."
> ...
> "Lexical forward dependencies in the iterations of the
> original loop must be preserved within each SIMD chunk."

The wording of OpenACC independent is more simple:
"... the independent clause tells the implementation that the iterations of 
this loop
are data-independent with respect to each other." -- OpenACC spec 2.7.9

I would say this implies even more relaxed conditions than OpenMP simd safelen,
essentially saying that the compiler doesn't even need dependence analysis; just
assume independence of iterations.

> So e.g. safelen >= 32 means for PTX you can safely implement it by
> running up to 32 consecutive iterations by all threads in the warp
> (assuming code that for some reason must be run by a single thread
> (e.g. calls to functions that are marked so that they expect to be run
> by the first thread in a warp initially) is run sequentially by increasing
> iterator), but it doesn't mean the iterations have no dependencies in between
> them whatsoever (see the above note about lexical forward dependencies),
> so you can't parallelize it by assigning different iterations to different
> threads outside of warp (or pthread_create created threads).

> So if OpenACC independent means there are no dependencies in between
> iterations, the OpenMP counterpart here is #pragma omp for simd schedule 
> (auto)
> or #pragma omp distribute parallel for simd schedule (auto).

schedule(auto) appears to correspond to the OpenACC 'auto' clause, or
what is implied in a kernels compute construct, but I'm not sure it implies
no dependencies between iterations?

Putting aside the semantic issues, as of currently safelen>0 turns on a certain 
amount of
vectorization code that we are not currently using (and not likely at all for 
nvptx).
Right now, we're just trying to pass the new flag to a kernels tree-parloops 
based pass.

Maybe this can all be reconciled later in a more precise way, e.g. have flags 
that correspond
specifically to phases of internal compiler passes (and selected by needs of 
the accel target),
instead of ones that are "sort of" associated with high-level language features.

Chung-Lin

Reply via email to