On 9/9/2013 9:37 AM, Tobias Burnus wrote:
Dear all,

sometimes it can be useful to annotate loops for better vectorization,
which is rather independent from parallelization.

For vectorization, GCC has [0]:
a) Cilk Plus's  #pragma simd  [1]
b) OpenMP 4.0's #pragma omp simd [2]

Those require -fcilkplus and -fopenmp, respectively, and activate much
more. The question is whether it makes sense to provide a means to ask
the compiler for SIMD vectorization without enabling all the other things
of Cilk Plus/OpenMP. What's your opinion?

[If one provides it, the question is whether it is always on or not,
which syntax/semantics it uses [e.g. just the one of Cilk or OpenMP]
and what to do with conflicting pragmas which can occur in this case.]


Side remark: For vectorization, the widely supported #pragma ivdep,
vector, novector can be also useful, even if they are less formally
defined. "ivdep" seems to be one of the more useful ones, whose
semantics one can map to a safelen of infinity in OpenMP's semenatics
[i.e. loop->safelen = INT_MAX].

Tobias

[0] In the trunk is currently only some initial middle-end support.
OpenMP's imp simd is in the gomp-4_0-branch; Cilk Plus's simd has been
submitted for the trunk at
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01626.html
[1] http://www.cilkplus.org/download#open-specification
[2] http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
ifort/icc have a separate option -openmp-simd for the purpose of activating omp simd directives without invoking OpenMP. In the previous release, in order to activate both OpenMP parallel and omp simd, both options were required (-openmp -openmp-simd). In the new "SP1" release last week, -openmp implies -openmp-simd. Last time I checked, turning off the options did not cause the compiler to accept but ignore all omp simd directives, as I personally thought would be desirable. A few cases are active regardless of compile line option, but many will be rejected without matching options.

Current Intel implementations of safelen will fail to vectorize and give notice if the value is set unnecessarily large. It's been agreed that increasing the safelen value beyond the optimum level should not turn off vectorization. safelen(32) is optimum for several float/single precision cases in the Intel(r) Xeon Phi(tm) cross compiler; needless to say, safelen(8) is sufficient for 128-bit SSE2.

I pulled down an update of gcc gomp-4_0-branch yesterday and see in the not-yet-working additions to gcc testsuite there appears to be a move toward adding more cilkplus clauses to omp simd, such as firstprivate lastprivate (which are accepted but apparently ignored in the Intel omp simd implementation). I'll be discussing in a meeting later today my effort to publish material including discussion of OpenMP 4.0 implementations.

--
Tim Prince

Reply via email to