On Sat, Apr 27, 2013 at 12:30:28PM -0500, Aldy Hernandez wrote: > >>"The syntax and semantics of the various simd-openmp-data-clauses > >>are detailed in the OpenMP specification. > >>(http://www.openmp.org/mp-documents/spec30.pdf, Section 2.9.3)." > >> > >>Balaji, can you verify which is correct? For that matter, which > >>are the official specs from which we should be basing this work? > > > >Privatization clause makes a variable private for the simd lane. In > >general, I would follow the spec. If you have further questions, > >please feel free to ask. > > Ok, so the Cilk Plus 1.1 spec is incorrectly pointing to the OpenMP > 3.0 spec, because the OpenMP 3.0 spec has the private clause being > task/thread private. Since the OpenMP 4.0rc2 explicitly says that > the private clause is for the SIMD lane (as you've stated), can we > assume that when the Cilk Plus 1.1 spec mentions OpenMP, it is > talking about the OpenMP 4.0 spec?
One way we could implement the SIMD private/lastprivate/reduction vars and for Cilk+ also firstprivate ones might be: - query the target what the maximum possible vectorization factor for the loop is (and min that with simdlen if any), let's call it MAXVF for say struct S { S (); ~S (); int x; }; ... int a, b; S s; #pragma omp simd private (a, s) reduction (+:b) for (int i = 0; i < N; i++) { foo (&a, &s); b += a; } we'd then emit something like: int a_[MAXVF], b_[MAXVF]; S s_[MAXVF]; for (tmp = 0; tmp < __builtin_omp.simd_vf (simd_uid); tmp++) { b_[tmp] = 0; S::S (&s_[tmp]); } # loop simd_uid with safelen(MAXVF) for (i = 0; i < N; i++) { tmp = __builtin_omp.simd_lane (simd_uid); foo (&a_[tmp], &s_[tmp]); b_[tmp] += a_[tmp]; } for (tmp = 0; tmp < __builtin_omp.simd_vf (simd_uid); tmp++) { S::~S (&s[tmp]); b += b_[tmp]; } where simd_uid would be some say integer constant, unique to the simd loop (at least unique within the same function, and perhaps inlining/LTO would need to remap). The loop simd_uid would be stored by ompexp pass into the loop structure. Then the vectorizer (ideally, we'd enable vectorization even when not explicitly disabled through -fno-tree-vectorize for -fopenmp or -fcilk+, though in that case only for the explicit simd loops) would treat arrays indexed by __builtin_omp.simd_lane (simd_uid) (dot in the name just to make it impossible to be used by users) (or marked with some special hidden attribute or something) specially, allow promoting them to just vector vars if not addressable, etc., and would record the chosen vectorization factor in the loop structure, and __builtin_omp.simd_vf would then expand to the vectorization factor and __builtin_omp.simd_lane to the number of the lane. If vectorization couldn't be performed on some loop, __builtin_omp.simd_vf would just be folded into 1 and __builtin_omp.simd_lane into 0 say by some ompsimd pass run soon after the vectorization. Thoughts on this? Or do you see better IL representation of this stuff from the omp expansion till vectorization? I mean, e.g. for floating point or user defined reductions it might be important in what order they are performed (unless -ffast-math for the former). Jakub