After the help last week on openmp, I got inspired and bought a dual-quad opteron machine for the department to show 8-way scaling for my students ("Hey, its much cheaper than something new in the optics lab", my dept. chair laughed).
I've been working on said machine over the past few days and found something really weird in an OpenMP example program I descrobed to the list. The machine is a dual-proc AMD Opteron 2350, Tyan n3600T (S2937) mainboard, w/ 8GB ram. Initially, I installed the i386 version of Scientific Linux 5.2, but then realized that only half of the RAM was usable, and re-installed SL5.2 x86_64 this morning. The example program is appended to the end of this email. Again, it is a 2-d finite-difference solution to the laplace equation (the context being to "predict" lightning strikes based on the potential between the ground and some clouds overhead). The program scales beautifully up to OMP_NUM_THREADS~6 or 7, but when I set the number of threads to 8, something weird happens, and instead of taking the normal 241 iterations to converge, the program converges after 1 step. This reeks of numerical instability to me, but my programming experience in x86_64 is very limited. I'm using gfortran, with the simple compile string, gfortran clouds_example_OpenMP.f90 -m64 -fopenmp Any insight into what obvious mistake I'm making would be wonderful! The stability of the output seems erratic to me. Sometimes when OMP_NUM_THREADS=7 the result converges normally after 241 iterations and at other times, the result converges after 1 iteration (indicating some sort of problem with hardware???) This didn't occur yesterday when the machine was running SL5.2, i386. Simulation Output: [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=1 [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS [EMAIL PROTECTED] clouds]$ ./a.out Hello World from thread 0 There are 1 threads ... convergence criteria is \Delta V < 0.250000003725290 iterations necessary, 241 initialization time, 0 simulation time, 57 [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=2 [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS [EMAIL PROTECTED] clouds]$ ./a.out Hello World from thread 0 Hello World from thread 1 There are 2 threads ... convergence criteria is \Delta V < 0.250000003725290 iterations necessary, 241 initialization time, 0 simulation time, 28 [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=4 [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS [EMAIL PROTECTED] clouds]$ ./a.out Hello World from thread 3 Hello World from thread 1 Hello World from thread 0 Hello World from thread 2 There are 4 threads ... convergence criteria is \Delta V < 0.250000003725290 iterations necessary, 241 initialization time, 0 simulation time, 14 [EMAIL PROTECTED] clouds]$ OMP_NUM_THREADS=8 [EMAIL PROTECTED] clouds]$ export OMP_NUM_THREADS [EMAIL PROTECTED] clouds]$ ./a.out Hello World from thread 2 ... convergence criteria is \Delta V < 0.250000003725290 iterations necessary, 1 initialization time, 0 simulation time, 0 Code listing: [EMAIL PROTECTED] clouds]$ cat clouds_example_OpenMP.f90 ! ! use omp_lib ! IMPLICIT NONE integer,parameter::Nx=2000 integer,parameter::Ny=2000 real*8 v(Nx,Ny), dv(Nx,Ny) integer boundary(Nx,Ny) integer i,j,converged,i1,i2 integer t0,t1,t2 real*8 convergence_v, v_cloud, v_ground, max_dv real*8 bump_P,old_v real*8 Lx,Ly,dx,dy,v_y ! real*8 lightning_rod_center, lightning_rod_height ! real*8 house_center, house_height, house_width integer num_iterations ! integer:: id, nthreads !$omp parallel private(id) id = omp_get_thread_num() write (*,*) 'Hello World from thread', id !$omp barrier if ( id == 0 ) then nthreads = omp_get_num_threads() write (*,*) 'There are', nthreads, 'threads' end if !$omp end parallel ! initial time t0 = secnds(0.0) dx =0.1d0 ! differential lengths, m dy =0.1d0 Lx = Nx*dx ! system sizes, m Ly = Ny*dy print *,"\nSimulation has bounds:\n\tX: 0,",Lx,"\n\tY: 0,",Ly print *,"\tNx = ",Nx,"\n\tNy = ",Ny print *,"\tdx = ",dx,"\n\tdy = ",dy v_cloud = -10000.d0 ! volts v_ground = 0.d0 ! initialize the the boundary conditions ! ! first, set the solution function (v), to look like a ! parallel-plate capacitor ! ! note that there is one large parallel section and several ! parallel do's inside that region !$OMP PARALLEL ! !$OMP DO !$OMP& SHARED(Nx,Ny,boundary,v_cloud,v_ground,Ly,dy,v) !$OMP& PRIVATE(i,j) do j=1,Ny do i=1,Nx boundary(i,j)=0 v(i,j) = v_ground + (v_cloud-v_ground)*(j*dy/Ly) end do end do !$OMP END DO ! !$OMP DO !$OMP& SHARED(Nx,Ny,boundary) !$OMP& PRIVATE(i) do i=1,Nx boundary(i,1)=1 ! we need to ensure that the edges of boundary(i,Ny)=1 ! the domain are held as boundary end do !$OMP END DO ! !$OMP DO !$OMP& SHARED(boundary,Nx) !$OMP& PRIVATE(j) do j=1,Ny boundary(1,j)=1 boundary(Nx,j)=1 end do !$OMP END DO !$OMP END PARALLEL ! set up an interesting feature on the lower boundary ! do this in parallel with SECTIONS directive ! !$OMP PARALLEL !$OMP& SHARED(v,boundary,Nx,Ny,dx,dy,Lx,Ly,lightning_rod_height) !$OMP& PRIVATE(lightning_rod_center,house_center,house_height,house_width)) !$OMP SECTIONS !$OMP SECTION ! Lightning_rod lightning_rod_center = Lx*0.6d0 lightning_rod_height = 5.0d0 call create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) !$OMP SECTION lightning_rod_center = Lx*0.5d0 call create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) !$OMP SECTION lightning_rod_center = Lx*0.7d0 call create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) !$OMP SECTION ! house house_center = 0.4d0*Lx house_height = 5.0d0 house_width = 5.0d0 call create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary) !$OMP END SECTIONS !$OMP END PARALLEL ! initialization done t1 = secnds(0.0) ! main solution iteration ! ! repeat the recursion relation until the maximum change ! from update to update is less than a convergence epsilon, convergence_v = (0.05)*dabs(v_ground-v_cloud)/(1.d0*Ny) print *,"\nconvergence criteria is \Delta V < ",convergence_v num_iterations = 0 ! ! iteration implemented with a goto or a do-while converged=0 do while( converged .eq. 0) converged = 1 num_iterations = num_iterations + 1 !$OMP PARALLEL !$OMP DO !$OMP& PRIVATE(i,j) !$OMP& SHARED(Ny,Nx,dv,v,boundary)) do j=2,(Ny-1) do i=2,(Nx-1) dv(i,j) = 0.25d0*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1)) - v(i,j) dv(i,j) = dv(i,j)*(1.d0-DFLOAT(boundary(i,j))) end do end do !$OMP END DO max_dv = 0.d0 !$OMP DO !$OMP& PRIVATE(i,j) !$OMP& SHARED(NX,NY,dv,v)) !$OMP& REDUCTION(MAX:max_dv) do j=2,(Ny-1) do i=2,(Nx-1) v(i,j) = v(i,j) + dv(i,j) if(dv(i,j) .gt. max_dv) then max_dv = dv(i,j) endif end do end do !$OMP END DO !$OMP END PARALLEL if(max_dv .ge. convergence_v) then converged = 0 endif end do ! simulation finished t2 = secnds(0.0) print *," iterations necessary, ",num_iterations print *," initialization time, ",t1-t0 print *," simulation time, ",t2-t1 open(unit=10,file="v_output.dat") write(10,*) "# x\ty\tv(x,y)" do j=1,Ny !do i=1,Nx ! skipping the full array print to save execution time ! the printed data file is normally sent to gnuplot w/ splot i=10 write (10,*) i*dx,j*dy,v(i,j) !enddo write (10,*)" " end do close(10) stop end subroutine create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary) IMPLICIT NONE integer Nx, Ny,j,boundary(Nx,Ny) integer j_limit integer index_lightning_rod_center real*8 v(Nx,Ny) real*8 lightning_rod_center,lightning_rod_height real*8 dx, dy, v_ground index_lightning_rod_center = lightning_rod_center/dx j_limit = lightning_rod_height/dy do j=1,j_limit v(index_lightning_rod_center,j) = v_ground boundary(index_lightning_rod_center,j) = 1 end do print *,"Created a lightning rod of height ",lightning_rod_height print *,"\ty_index ",j_limit print *,"\tx-position ",lightning_rod_center print *,"\tx_index ",index_lightning_rod_center end subroutine subroutine create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary) IMPLICIT NONE integer Nx, Ny, boundary(Nx,Ny) real*8 v(Nx,Ny) real*8 v_ground, dx, dy integer i,j,i_limit,j_limit, index_house_center real*8 house_center,house_height,house_width index_house_center = house_center/dx i_limit = 0.5d0*house_width/dx j_limit = house_height/dy do j=1,j_limit do i=(index_house_center-i_limit),(index_house_center+i_limit) v(i,j) = v_ground boundary(i,j) = 1 end do end do print *,"Created a house of height ",house_height print *,"\ty_index ",j_limit print *,"\twidth ",house_width print *,"\thouse bounds: ",index_house_center-i_limit,index_house_center+i_limit end subroutine -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - -
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf