Write miss are indication that data had to be imported inside L1 before it could be written. I don't know if valgrind can give indication of false sharing, unfortunately. That's why I suggested you use a multiple of the cache line so that false sharing do not occur.
Matthieu 2011/2/19 Sebastian Haase <seb.ha...@gmail.com> > Thanks a lot. Very informative. I guess what you say about "cache line > is dirtied" is related to the info I got with valgrind (see my email > in this thread: L1 Data Write Miss 3636). > Can one assume that the cache line is always a few mega bytes ? > > Thanks, > Sebastian > > On Sat, Feb 19, 2011 at 12:40 AM, Sturla Molden <stu...@molden.no> wrote: > > Den 17.02.2011 16:31, skrev Matthieu Brucher: > > > > It may also be the sizes of the chunk OMP uses. You can/should specify > > them.in > > > > Matthieu > > > > the OMP pragma so that it is a multiple of the cache line size or > something > > close. > > > > Also beware of "false sharing" among the threads. When one processor > updates > > the array "dist" in Sebastian's code, the cache line is dirtied for the > > other processors: > > > > #pragma omp parallel for private(j, i,ax,ay, dif_x, dif_y) > > for(i=0;i<na;i++) { > > ax=a_ps[i*nx1]; > > ay=a_ps[i*nx1+1]; > > for(j=0;j<nb;j++) { > > dif_x = ax - b_ps[j*nx2]; > > dif_y = ay - b_ps[j*nx2+1]; > > > > /* update shared memory */ > > > > dist[2*i+j] = sqrt(dif_x*dif_x+dif_y*dif_y); > > > > /* ... and poof the cache is dirty */ > > > > } > > } > > > > Whenever this happens, the processors must stop whatever they are doing > to > > resynchronize their cache lines. "False sharing" can therefore work as an > > "invisible GIL" inside OpenMP code.The processors can appear to run in > > syrup, and there is excessive traffic on the memory bus. > > > > This is also why MPI programs often scale better than OpenMP programs, > > despite the IPC overhead. > > > > An advice when working with OpenMP is to let each thread write to private > > data arrays, and only share read-only arrays. > > > > One can e.g. use OpenMP's "reduction" pragma to achieve this. E.g. > intialize > > the array dist with zeros, and use reduction(+:dist) in the OpenMP pragma > > line. > > > > Sturla > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion