On Sat, 2007-10-20 at 22:32 +0400, Tomash Brechko wrote: > I'm not sure what OpenMP spec says about default data scope (too lazy > to read through),
> but it seems that examples from > http://kallipolis.com/openmp/2.html assume default(private), while GCC > GOMP defaults to shared. In your case, > > #pragma omp parallel for shared(A, row, col) > for (i = k+1; i<SIZE; i++) { > for (j = k+1; j<SIZE; j++) { > A[i][j] = A[i][j] - row[i] * col[j]; > } > } > > '#pragma omp for' makes 'i' private implicitly (it couldn't be > otherwise), but 'j' is still shared. Good job!! Dang, so used to C++ and other languages where the control variable is localised. Haha .. but not in my own language Felix. > I just tried your original case, > not only it is slow, but it also produces different results with and > without OpenMP (just try to print any elem of 'A'). Adding > 'private(j)' (or defining 'j' inside the outer loop) will fix the > case. > > It would be nice if someone would post the measurement for the fixed > case, my machine has only HT, and I experience slowdown for this > example (but still it runs much faster then before the fix). Now I get: #threads Real User Sys 1 1.052 1.043 0.009 2 0.866 1.582 0.026 This is a much better result, 50% speedup (30% less time used). I only have a dual core at the moment (without HT), be nice to see the result for a quad! BTW: I also tried this variation in C++: #pragma omp parallel for shared(A, row, col) for (i = k+1; i<SIZE; i++) { for (int j = k+1; j<SIZE; j++) { ///<----------------- A[i][j] = A[i][j] - row[i] * col[j]; } } which works with the same timings as the C with 'private(j)'. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net