Re: gomp slowness

skaller Sat, 20 Oct 2007 17:37:08 -0700

On Sat, 2007-10-20 at 22:32 +0400, Tomash Brechko wrote:
> I'm not sure what OpenMP spec says about default data scope (too lazy
> to read through),


>  but it seems that examples from
> http://kallipolis.com/openmp/2.html assume default(private), while GCC
> GOMP defaults to shared.  In your case,
> 
>   #pragma omp parallel for shared(A, row, col)
>     for (i = k+1; i<SIZE; i++) {
>       for (j = k+1; j<SIZE; j++) {
>           A[i][j] = A[i][j] - row[i] * col[j];
>       }
>     }
> 
> '#pragma omp for' makes 'i' private implicitly (it couldn't be
> otherwise), but 'j' is still shared.  

Good job!! 

Dang, so used to C++ and other languages where the control
variable is localised. Haha .. but not in my own language Felix.



> I just tried your original case,
> not only it is slow, but it also produces different results with and
> without OpenMP (just try to print any elem of 'A').  Adding
> 'private(j)' (or defining 'j' inside the outer loop) will fix the
> case.
> 
> It would be nice if someone would post the measurement for the fixed
> case, my machine has only HT, and I experience slowdown for this
> example (but still it runs much faster then before the fix).

Now I get: #threads   Real  User   Sys
               1     1.052  1.043  0.009
               2     0.866  1.582  0.026

This is a much better result, 50% speedup (30% less time used).
I only have a dual core at the moment (without HT), be nice
to see the result for a quad!

BTW: I also tried this variation in C++:

  #pragma omp parallel for shared(A, row, col)
    for (i = k+1; i<SIZE; i++) {
      for (int j = k+1; j<SIZE; j++) {
           ///<-----------------
          A[i][j] = A[i][j] - row[i] * col[j];
      }
    }

which works with the same timings as the C with 'private(j)'.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

Re: gomp slowness

Reply via email to