On 9/9/2013 9:37 AM, Tobias Burnus wrote:
Dear all,
sometimes it can be useful to annotate loops for better vectorization,
which is rather independent from parallelization.
For vectorization, GCC has [0]:
a) Cilk Plus's #pragma simd [1]
b) OpenMP 4.0's #pragma omp simd [2]
Those require -fcilkplus and -fopenmp, respectively, and activate much
more. The question is whether it makes sense to provide a means to ask
the compiler for SIMD vectorization without enabling all the other things
of Cilk Plus/OpenMP. What's your opinion?
[If one provides it, the question is whether it is always on or not,
which syntax/semantics it uses [e.g. just the one of Cilk or OpenMP]
and what to do with conflicting pragmas which can occur in this case.]
Side remark: For vectorization, the widely supported #pragma ivdep,
vector, novector can be also useful, even if they are less formally
defined. "ivdep" seems to be one of the more useful ones, whose
semantics one can map to a safelen of infinity in OpenMP's semenatics
[i.e. loop->safelen = INT_MAX].
Tobias
[0] In the trunk is currently only some initial middle-end support.
OpenMP's imp simd is in the gomp-4_0-branch; Cilk Plus's simd has been
submitted for the trunk at
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01626.html
[1] http://www.cilkplus.org/download#open-specification
[2] http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
ifort/icc have a separate option -openmp-simd for the purpose of
activating omp simd directives without invoking OpenMP. In the previous
release, in order to activate both OpenMP parallel and omp simd, both
options were required (-openmp -openmp-simd). In the new "SP1" release
last week, -openmp implies -openmp-simd. Last time I checked, turning
off the options did not cause the compiler to accept but ignore all omp
simd directives, as I personally thought would be desirable. A few
cases are active regardless of compile line option, but many will be
rejected without matching options.
Current Intel implementations of safelen will fail to vectorize and give
notice if the value is set unnecessarily large. It's been agreed that
increasing the safelen value beyond the optimum level should not turn
off vectorization. safelen(32) is optimum for several float/single
precision cases in the Intel(r) Xeon Phi(tm) cross compiler; needless to
say, safelen(8) is sufficient for 128-bit SSE2.
I pulled down an update of gcc gomp-4_0-branch yesterday and see in the
not-yet-working additions to gcc testsuite there appears to be a move
toward adding more cilkplus clauses to omp simd, such as firstprivate
lastprivate (which are accepted but apparently ignored in the Intel omp
simd implementation).
I'll be discussing in a meeting later today my effort to publish
material including discussion of OpenMP 4.0 implementations.
--
Tim Prince