Thank you all for the answers. I've just started in a group where the code has been running for some time on the CPU and we started trying to run it on the GPU to see a processing gain. I'm going to talk here about the points you've already raised.
Thank you very much! Em qui., 31 de ago. de 2023 às 00:02, Barry Smith <bsm...@petsc.dev> escreveu: > > Yikes, sorry I missed that the first run was CPU and the second GPU. > > The run on the CPU is indicative of a very bad preconditioner. It > doesn't really converge. When the true residual norm jumps by a factor of > 10^3 at the first iteration, this means the ILU preconditioner is just not > appropriate or reasonable. The "convergence" of the preconditioned residual > norm is meaningless. > > > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm >> 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 >> > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm >> 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 >> > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm >> 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 >> > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm >> 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 >> > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm >> 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 >> > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm >> 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 >> > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm >> 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 >> > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm >> 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 > > > I won't worry about the GPU behavior (it is just due to slightly > different numerical computations on the GPU and not surprising.) > > You need to use a different preconditioner, even on the CPU. > > > On Aug 30, 2023, at 9:51 PM, Junchao Zhang <junchao.zh...@gmail.com> > wrote: > > > > > On Wed, Aug 30, 2023 at 8:46 PM Barry Smith <bsm...@petsc.dev> wrote: > >> >> What convergence do you get without the GPU matrix and vector >> operations? > > Barry, that was in the original email > >> >> >> Can you try the GPU run with -ksp_type gmres -ksp_pc_side right ? >> >> For certain problems, ILU can produce catastrophically bad >> preconditioners. >> Barry >> >> >> >> > On Aug 30, 2023, at 4:41 PM, Ramoni Z. Sedano Azevedo < >> ramoni.zsed...@gmail.com> wrote: >> > >> > Hello, >> > >> > I'm executing a code in Fortran using PETSc with MPI via CPU and I >> would like to execute it using GPU. >> > PETSc is configured as follows: >> > ./configure \ >> > --prefix=${PWD}/installdir \ >> > --with-fortran \ >> > --with-fortran-kernels=true \ >> > --with-cuda \ >> > --download-fblaslapack \ >> > --with-scalar-type=complex \ >> > --with-precision=double \ >> > --with-debugging=0 \ >> > --with-x=0 \ >> > --with-gnu-compilers=1 \ >> > --with-cc=mpicc \ >> > --with-cxx=mpicxx \ >> > --with-fc=mpif90 \ >> > --with-make-exec=make >> > >> > The parameters for using MPI on CPU are: >> > mpirun -np $ntasks ./${executable} \ >> > -A_mat_type mpiaij \ >> > -P_mat_type mpiaij \ >> > -em_ksp_monitor_true_residual \ >> > -em_ksp_type bcgs \ >> > -em_pc_type bjacobi \ >> > -em_sub_pc_type ilu \ >> > -em_sub_pc_factor_levels 3 \ >> > -em_sub_pc_factor_fill 6 \ >> > < ./Parameters.inp >> > >> > Code output: >> > Solving for Hz fields >> > bnorm 3.7727507818834821E-005 >> > xnorm 2.3407405211699372E-016 >> > Residual norms for em_ solve. >> > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm >> 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 >> > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm >> 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 >> > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm >> 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 >> > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm >> 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 >> > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm >> 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 >> > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm >> 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 >> > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm >> 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 >> > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm >> 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 >> > 8 KSP preconditioned resid norm 3.800488678870e-16 true resid norm >> 9.015738978063e-04 ||r(i)||/||b|| 2.389699054959e+01 >> > 9 KSP preconditioned resid norm 2.498733213989e-16 true resid norm >> 7.194509577987e-04 ||r(i)||/||b|| 1.906966559396e+01 >> > 10 KSP preconditioned resid norm 1.563017112250e-16 true resid norm >> 5.055208317846e-04 ||r(i)||/||b|| 1.339926385310e+01 >> > 11 KSP preconditioned resid norm 8.733803057628e-17 true resid norm >> 3.171941303660e-04 ||r(i)||/||b|| 8.407502872682e+00 >> > 12 KSP preconditioned resid norm 4.907010803529e-17 true resid norm >> 1.868311755294e-04 ||r(i)||/||b|| 4.952120782177e+00 >> > 13 KSP preconditioned resid norm 2.214070343700e-17 true resid norm >> 8.760421740830e-05 ||r(i)||/||b|| 2.322025028236e+00 >> > 14 KSP preconditioned resid norm 1.333171674446e-17 true resid norm >> 5.984548368534e-05 ||r(i)||/||b|| 1.586255948119e+00 >> > 15 KSP preconditioned resid norm 7.696778066646e-18 true resid norm >> 3.786809196913e-05 ||r(i)||/||b|| 1.003726303656e+00 >> > 16 KSP preconditioned resid norm 3.863008301366e-18 true resid norm >> 1.284864871601e-05 ||r(i)||/||b|| 3.405644702988e-01 >> > 17 KSP preconditioned resid norm 2.061402843494e-18 true resid norm >> 1.054741071688e-05 ||r(i)||/||b|| 2.795681805311e-01 >> > 18 KSP preconditioned resid norm 1.062033155108e-18 true resid norm >> 3.992776343462e-06 ||r(i)||/||b|| 1.058319664960e-01 >> > converged reason 2 >> > total number of relaxations 18 >> > ======================================== >> > >> > The parameters for GPU usage are: >> > mpirun -np $ntasks ./${executable} \ >> > -A_mat_type aijcusparse \ >> > -P_mat_type aijcusparse \ >> > -vec_type cuda \ >> > -use_gpu_aware_mpi 0 \ >> > -em_ksp_monitor_true_residual \ >> > -em_ksp_type bcgs \ >> > -em_pc_type bjacobi \ >> > -em_sub_pc_type ilu \ >> > -em_sub_pc_factor_levels 3 \ >> > -em_sub_pc_factor_fill 6 \ >> > < ./Parameters.inp >> > >> > Code output: >> > Solving for Hz fields >> > bnorm 3.7727507818834821E-005 >> > xnorm 2.3407405211699372E-016 >> > Residual norms for em_ solve. >> > 0 KSP preconditioned resid norm 1.236220954395e-08 true resid norm >> 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 >> > 1 KSP preconditioned resid norm 0.000000000000e+00 true resid norm >> 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 >> > converged reason 3 >> > total number of relaxations 1 >> > ======================================== >> > >> > Clearly the code running on GPU is not converging correctly. >> > Has anyone experienced this problem? >> > >> > Sincerely, >> > Ramoni Z. S. Azevedo >> > >> >> >