Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Sturla Molden Fri, 18 Feb 2011 15:40:22 -0800

Den 17.02.2011 16:31, skrev Matthieu Brucher:

It may also be the sizes of the chunk OMP uses. You can/should specifythem.in <http://them.in>
Matthieu
the OMP pragma so that it is a multiple of the cache line size orsomething close.

Also beware of "false sharing" among the threads. When one processorupdates the array "dist" in Sebastian's code, the cache line is dirtiedfor the other processors:


  #pragma omp parallel for private(j, i,ax,ay, dif_x, dif_y)
  for(i=0;i<na;i++){
     ax=a_ps[i*nx1];
     ay=a_ps[i*nx1+1];
     for(j=0;j<nb;j++) {
         dif_x = ax - b_ps[j*nx2];
         dif_y = ay - b_ps[j*nx2+1];

         /* update shared memory */

         dist[2*i+j]  = sqrt(dif_x*dif_x+dif_y*dif_y);

         /* ... and poof the cache is dirty */

     }
  }

Whenever this happens, the processors must stop whatever they are doingto resynchronize their cache lines. "False sharing" can therefore workas an "invisible GIL" inside OpenMP code.The processors can appear torun in syrup, and there is excessive traffic on the memory bus.

This is also why MPI programs often scale better than OpenMP programs,despite the IPC overhead.

An advice when working with OpenMP is to let each thread write toprivate data arrays, and only share read-only arrays.

One can e.g. use OpenMP's "reduction" pragma to achieve this. E.g.intialize the array dist with zeros, and use reduction(+:dist) in theOpenMP pragma line.


Sturla

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Reply via email to