on the web, http://course1.winona.edu/nmoore/clouds_example_OpenMP_f90.html
On Wed, Nov 26, 2008 at 1:22 PM, Bill Broadley <[EMAIL PROTECTED]> wrote: > I'd be happy to take a look, but email formatting of f90 sometimes seems to > cause issues. Could you send it as an attachment or put it on a webpage? > > Nathan Moore wrote: > > After the help last week on openmp, I got inspired and bought a dual-quad > > opteron machine for the department to show 8-way scaling for my students > > ("Hey, its much cheaper than something new in the optics lab", my dept. > > chair laughed). > > > > I've been working on said machine over the past few days and found > something > > really weird in an OpenMP example program I descrobed to the list. > > > > The machine is a dual-proc AMD Opteron 2350, Tyan n3600T (S2937) > mainboard, > > w/ 8GB ram. Initially, I installed the i386 version of Scientific Linux > > 5.2, but then realized that only half of the RAM was usable, and > > re-installed SL5.2 x86_64 this morning. > > > > The example program is appended to the end of this email. Again, it is a > > 2-d finite-difference solution to the laplace equation (the context being > to > > "predict" lightning strikes based on the potential between the ground and > > some clouds overhead). > > > > The program scales beautifully up to OMP_NUM_THREADS~6 or 7, but when I > set > > the number of threads to 8, something weird happens, and instead of > taking > > the normal 241 iterations to converge, the program converges after 1 > step. > > This reeks of numerical instability to me, but my programming experience > in > > x86_64 is very limited. > > > > I'm using gfortran, with the simple compile string, > > gfortran clouds_example_OpenMP.f90 -m64 -fopenmp > > > > Any insight into what obvious mistake I'm making would be wonderful! > > > > The stability of the output seems erratic to me. Sometimes when > > OMP_NUM_THREADS=7 the result converges normally after 241 iterations and > at > > other times, the result converges after 1 iteration (indicating some sort > of > > problem with hardware???) > > > > This didn't occur yesterday when the machine was running SL5.2, i386. > > > > > > > > > > Simulation Output: > > > > [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=1 > > [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS > > [EMAIL PROTECTED] clouds]$ ./a.out > > Hello World from thread 0 > > There are 1 threads > > ... > > convergence criteria is \Delta V < 0.250000003725290 > > iterations necessary, 241 > > initialization time, 0 > > simulation time, 57 > > > > > > > > > > [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=2 > > [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS > > [EMAIL PROTECTED] clouds]$ ./a.out > > Hello World from thread 0 > > Hello World from thread 1 > > There are 2 threads > > ... > > convergence criteria is \Delta V < 0.250000003725290 > > iterations necessary, 241 > > initialization time, 0 > > simulation time, 28 > > > > > > > > > > [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=4 > > [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS > > [EMAIL PROTECTED] clouds]$ ./a.out > > Hello World from thread 3 > > Hello World from thread 1 > > Hello World from thread 0 > > Hello World from thread 2 > > There are 4 threads > > ... > > convergence criteria is \Delta V < 0.250000003725290 > > iterations necessary, 241 > > initialization time, 0 > > simulation time, 14 > > > > > > > > > > [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=8 > > [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS > > [EMAIL PROTECTED] clouds]$ ./a.out > > Hello World from thread 2 > > ... > > convergence criteria is \Delta V < 0.250000003725290 > > iterations necessary, 1 > > initialization time, 0 > > simulation time, 0 > > > > Code listing: > > > > [EMAIL PROTECTED] clouds]$ cat clouds_example_OpenMP.f90 > > ! > > ! > > use omp_lib > > ! > > IMPLICIT NONE > > integer,parameter::Nx=2000 > > integer,parameter::Ny=2000 > > real*8 v(Nx,Ny), dv(Nx,Ny) > > integer boundary(Nx,Ny) > > integer i,j,converged,i1,i2 > > integer t0,t1,t2 > > real*8 convergence_v, v_cloud, v_ground, max_dv > > real*8 bump_P,old_v > > real*8 Lx,Ly,dx,dy,v_y > > ! > > real*8 lightning_rod_center, lightning_rod_height > > ! > > real*8 house_center, house_height, house_width > > integer num_iterations > > ! > > integer:: id, nthreads > > !$omp parallel private(id) > > id = omp_get_thread_num() > > write (*,*) 'Hello World from thread', id > > !$omp barrier > > if ( id == 0 ) then > > nthreads = omp_get_num_threads() > > write (*,*) 'There are', nthreads, 'threads' > > end if > > !$omp end parallel > > > > ! initial time > > t0 = secnds(0.0) > > > > dx =0.1d0 ! differential lengths, m > > dy =0.1d0 > > Lx = Nx*dx ! system sizes, m > > Ly = Ny*dy > > > > print *,"\nSimulation has bounds:\n\tX: 0,",Lx,"\n\tY: 0,",Ly > > print *,"\tNx = ",Nx,"\n\tNy = ",Ny > > print *,"\tdx = ",dx,"\n\tdy = ",dy > > > > v_cloud = -10000.d0 ! volts > > v_ground = 0.d0 > > > > ! initialize the the boundary conditions > > ! > > ! first, set the solution function (v), to look like a > > ! parallel-plate capacitor > > ! > > ! note that there is one large parallel section and several > > ! parallel do's inside that region > > !$OMP PARALLEL > > ! > > !$OMP DO > > !$OMP& SHARED(Nx,Ny,boundary,v_cloud,v_ground,Ly,dy,v) > > !$OMP& PRIVATE(i,j) > > do j=1,Ny > > do i=1,Nx > > boundary(i,j)=0 > > v(i,j) = v_ground + (v_cloud-v_ground)*(j*dy/Ly) > > end do > > end do > > !$OMP END DO > > ! > > !$OMP DO > > !$OMP& SHARED(Nx,Ny,boundary) > > !$OMP& PRIVATE(i) > > do i=1,Nx > > boundary(i,1)=1 ! we need to ensure that the edges of > > boundary(i,Ny)=1 ! the domain are held as boundary > > end do > > !$OMP END DO > > ! > > !$OMP DO > > !$OMP& SHARED(boundary,Nx) > > !$OMP& PRIVATE(j) > > do j=1,Ny > > boundary(1,j)=1 > > boundary(Nx,j)=1 > > end do > > !$OMP END DO > > !$OMP END PARALLEL > > > > > > ! set up an interesting feature on the lower boundary > > ! do this in parallel with SECTIONS directive > > ! > > !$OMP PARALLEL > > !$OMP& SHARED(v,boundary,Nx,Ny,dx,dy,Lx,Ly,lightning_rod_height) > > !$OMP& > PRIVATE(lightning_rod_center,house_center,house_height,house_width)) > > !$OMP SECTIONS > > > > !$OMP SECTION > > ! Lightning_rod > > lightning_rod_center = Lx*0.6d0 > > lightning_rod_height = 5.0d0 > > call > > > create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) > > > > !$OMP SECTION > > lightning_rod_center = Lx*0.5d0 > > call > > > create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) > > > > !$OMP SECTION > > lightning_rod_center = Lx*0.7d0 > > call > > > create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) > > > > !$OMP SECTION > > ! house > > house_center = 0.4d0*Lx > > house_height = 5.0d0 > > house_width = 5.0d0 > > call > > > create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary) > > > > !$OMP END SECTIONS > > !$OMP END PARALLEL > > > > ! initialization done > > t1 = secnds(0.0) > > > > > > > > > > > > ! main solution iteration > > ! > > ! repeat the recursion relation until the maximum change > > ! from update to update is less than a convergence epsilon, > > convergence_v = (0.05)*dabs(v_ground-v_cloud)/(1.d0*Ny) > > print *,"\nconvergence criteria is \Delta V < ",convergence_v > > num_iterations = 0 > > > > ! > > ! iteration implemented with a goto or a do-while > > converged=0 > > do while( converged .eq. 0) > > > > converged = 1 > > num_iterations = num_iterations + 1 > > !$OMP PARALLEL > > !$OMP DO > > !$OMP& PRIVATE(i,j) > > !$OMP& SHARED(Ny,Nx,dv,v,boundary)) > > do j=2,(Ny-1) > > do i=2,(Nx-1) > > dv(i,j) = > > 0.25d0*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1)) - v(i,j) > > dv(i,j) = dv(i,j)*(1.d0-DFLOAT(boundary(i,j))) > > end do > > end do > > !$OMP END DO > > > > max_dv = 0.d0 > > !$OMP DO > > !$OMP& PRIVATE(i,j) > > !$OMP& SHARED(NX,NY,dv,v)) > > !$OMP& REDUCTION(MAX:max_dv) > > do j=2,(Ny-1) > > do i=2,(Nx-1) > > v(i,j) = v(i,j) + dv(i,j) > > if(dv(i,j) .gt. max_dv) then > > max_dv = dv(i,j) > > endif > > end do > > end do > > !$OMP END DO > > !$OMP END PARALLEL > > > > if(max_dv .ge. convergence_v) then > > converged = 0 > > endif > > > > end do > > > > > > > > > > > > > > > > ! simulation finished > > t2 = secnds(0.0) > > > > print *," iterations necessary, ",num_iterations > > print *," initialization time, ",t1-t0 > > print *," simulation time, ",t2-t1 > > > > > > open(unit=10,file="v_output.dat") > > write(10,*) "# x\ty\tv(x,y)" > > do j=1,Ny > > !do i=1,Nx > > ! skipping the full array print to save execution time > > ! the printed data file is normally sent to gnuplot w/ splot > > i=10 > > write (10,*) i*dx,j*dy,v(i,j) > > !enddo > > write (10,*)" " > > end do > > close(10) > > > > > > stop > > end > > > > > > > > > > subroutine > > > create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) > > IMPLICIT NONE > > integer Nx, Ny,j,boundary(Nx,Ny) > > integer j_limit > > integer index_lightning_rod_center > > real*8 v(Nx,Ny) > > real*8 lightning_rod_center,lightning_rod_height > > real*8 dx, dy, v_ground > > > > index_lightning_rod_center = lightning_rod_center/dx > > j_limit = lightning_rod_height/dy > > do j=1,j_limit > > v(index_lightning_rod_center,j) = v_ground > > boundary(index_lightning_rod_center,j) = 1 > > end do > > > > print *,"Created a lightning rod of height ",lightning_rod_height > > print *,"\ty_index ",j_limit > > print *,"\tx-position ",lightning_rod_center > > print *,"\tx_index ",index_lightning_rod_center > > > > > > end subroutine > > > > > > > > > > > > > > > > subroutine > > > create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary) > > IMPLICIT NONE > > integer Nx, Ny, boundary(Nx,Ny) > > real*8 v(Nx,Ny) > > real*8 v_ground, dx, dy > > integer i,j,i_limit,j_limit, index_house_center > > real*8 house_center,house_height,house_width > > > > index_house_center = house_center/dx > > i_limit = 0.5d0*house_width/dx > > j_limit = house_height/dy > > do j=1,j_limit > > do i=(index_house_center-i_limit),(index_house_center+i_limit) > > v(i,j) = v_ground > > boundary(i,j) = 1 > > end do > > end do > > > > print *,"Created a house of height ",house_height > > print *,"\ty_index ",j_limit > > print *,"\twidth ",house_width > > print *,"\thouse bounds: > > ",index_house_center-i_limit,index_house_center+i_limit > > > > end subroutine > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - -
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf