Dear Barry, thanks for offering to look at this. I added the options you suggested but it did create empty files. So I did save files manually. At the link https://drive.google.com/file/d/17jqLJaMyWSuAe6XSVeXzXnGsM_R2DXCy/view?usp=sharing you can find a folder with: the poisson matrix, the ampere matrix, a miniapp that read the matrices and call the solver, the petscrc file, and the slurm log that reproduces the error. I look forward to hear from you.
Thanks again, Nicola Il giorno mer 12 ago 2020 alle ore 11:15 Barry Smith <bsm...@petsc.dev> ha scritto: > > Interesting, we don't see crashes in GAMG. Could you run with > > -ksp_view_mat binary -ksp_view_rhs binary > > this will create a file called binaryoutput, you can email it to > petsc-ma...@mcs.anl.gov or if it is too large for email post it somewhere > and email the link from this we could possibly recreate the crash to see > what is going wrong. > > Barry > > 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info > = -7) > > > On Aug 12, 2020, at 3:18 AM, nicola varini <nicola.var...@gmail.com> > wrote: > > Dear all, following the suggestions I did resubmit the simulation with the > petscrc below. > However I do get the following error: > ======== > 7362 [592]PETSC ERROR: #1 formProl0() line 748 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7363 [339]PETSC ERROR: Petsc has generated inconsistent data > 7364 [339]PETSC ERROR: xGEQRF error > 7365 [339]PETSC ERROR: See > https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 > 7367 [339]PETSC ERROR: > /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 > by nvarini Wed Aug 12 10:06:15 2020 > 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn > --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 > --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 > --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 > --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 > --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell > --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 > --with-cuda-c=nvcc --with-cxxlib-autodetect=0 > --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - > -with-cxx=CC > --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include > 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7375 [339]PETSC ERROR: #1 formProl0() line 748 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value > (info = -7) > 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > ======== > > I did try other pc_gamg_type but they fails as well. > > > #PETSc Option Table entries: > -ampere_dm_mat_type aijcusparse > -ampere_dm_vec_type cuda > -ampere_ksp_atol 1e-15 > -ampere_ksp_initial_guess_nonzero yes > -ampere_ksp_reuse_preconditioner yes > -ampere_ksp_rtol 1e-7 > -ampere_ksp_type dgmres > -ampere_mg_levels_esteig_ksp_max_it 10 > -ampere_mg_levels_esteig_ksp_type cg > -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -ampere_mg_levels_ksp_type chebyshev > -ampere_mg_levels_pc_type jacobi > -ampere_pc_gamg_agg_nsmooths 1 > -ampere_pc_gamg_coarse_eq_limit 10 > -ampere_pc_gamg_reuse_interpolation true > -ampere_pc_gamg_square_graph 1 > -ampere_pc_gamg_threshold 0.05 > -ampere_pc_gamg_threshold_scale .0 > -ampere_pc_gamg_type agg > -ampere_pc_type gamg > -dm_mat_type aijcusparse > -dm_vec_type cuda > -log_view > -poisson_dm_mat_type aijcusparse > -poisson_dm_vec_type cuda > -poisson_ksp_atol 1e-15 > -poisson_ksp_initial_guess_nonzero yes > -poisson_ksp_reuse_preconditioner yes > -poisson_ksp_rtol 1e-7 > -poisson_ksp_type dgmres > -poisson_log_view > -poisson_mg_levels_esteig_ksp_max_it 10 > -poisson_mg_levels_esteig_ksp_type cg > -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -poisson_mg_levels_ksp_max_it 1 > -poisson_mg_levels_ksp_type chebyshev > -poisson_mg_levels_pc_type jacobi > -poisson_pc_gamg_agg_nsmooths 1 > -poisson_pc_gamg_coarse_eq_limit 10 > -poisson_pc_gamg_reuse_interpolation true > -poisson_pc_gamg_square_graph 1 > -poisson_pc_gamg_threshold 0.05 > -poisson_pc_gamg_threshold_scale .0 > -poisson_pc_gamg_type agg > -poisson_pc_type gamg > -use_mat_nearnullspace true > #End of PETSc Option Table entries > > Regards, > > Nicola > > Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams <mfad...@lbl.gov> ha > scritto: > >> >> >> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini <stefano.zamp...@gmail.com> >> wrote: >> >>> Nicola, >>> >>> You are actually not using the GPU properly, since you use HYPRE >>> preconditioning, which is CPU only. One of your solvers is actually slower >>> on “GPU”. >>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with >>> Jacobi preconditioning. Mark can help you out with the specific command >>> line options. >>> When it works properly, everything related to PC application is >>> offloaded to the GPU, and you should expect to get the well-known and >>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>> >>> >> The speedup depends on the machine, but on SUMMIT, using enough CPUs to >> saturate the memory bus vs all 6 GPUs the speedup is a function of problem >> subdomain size. I saw 10x at about 100K equations/process. >> >> >>> Doing what you want to do is one of the last optimization steps of an >>> already optimized code before entering production. Yours is not even >>> optimized for proper GPU usage yet. >>> Also, any specific reason why you are using dgmres and fgmres? >>> >>> PETSc has not been designed with multi-threading in mind. You can >>> achieve “overlap” of the two solves by splitting the communicator. But then >>> you need communications to let the two solutions talk to each other. >>> >>> Thanks >>> Stefano >>> >>> >>> On Aug 4, 2020, at 12:04 PM, nicola varini <nicola.var...@gmail.com> >>> wrote: >>> >>> Dear all, thanks for your replies. The reason why I've asked if it is >>> possible to overlap poisson and ampere is because they roughly >>> take the same amount of time. Please find in attachment the profiling >>> logs for only CPU and only GPU. >>> Of course it is possible to split the MPI communicator and run each >>> solver on different subcommunicator, however this would involve more >>> communication. >>> Did anyone ever tried to run 2 solvers with hyperthreading? >>> Thanks >>> >>> >>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams <mfad...@lbl.gov> ha >>> scritto: >>> >>>> I suspect that the Poisson and Ampere's law solve are not coupled. You >>>> might be able to duplicate the communicator and use two threads. You would >>>> want to configure PETSc with threadsafty and threads and I think it >>>> could/should work, but this mode is never used by anyone. >>>> >>>> That said, I would not recommend doing this unless you feel like >>>> playing in computer science, as opposed to doing application science. The >>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>> but you will never come close to it. Your hardware has some balance of CPU >>>> to GPU processing rate. Your application has a balance of volume of work >>>> for your two solves. They have to be the same to get close to 2x speedup >>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>> guess about your applications let's assume that the cost of each of these >>>> two solves is about the same (eg, Laplacians on your domain and the best >>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>> match this problem, and the two solves have the same cost, you will not see >>>> close to 2x speedup. Your time is better spent elsewhere. >>>> >>>> Mark >>>> >>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown <j...@jedbrown.org> wrote: >>>> >>>>> You can use MPI and split the communicator so n-1 ranks create a DMDA >>>>> for one part of your system and the other rank drives the GPU in the other >>>>> part. They can all be part of the same coupled system on the full >>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>> transfer costs on each iteration (and that might swamp any performance >>>>> benefit you would have gotten). >>>>> >>>>> In any case, be sure to think about the execution time of each part. >>>>> Load balancing with matching time-to-solution for each part can be really >>>>> hard. >>>>> >>>>> >>>>> Barry Smith <bsm...@petsc.dev> writes: >>>>> >>>>> > Nicola, >>>>> > >>>>> > This is really viable or practical at this time with PETSc. It >>>>> is not impossible but requires careful coding with threads, another >>>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>>> also not trivial. I would recommend first seeing what kind of performance >>>>> you can get on the GPU for each type of solve and revist this idea in the >>>>> future. >>>>> > >>>>> > Barry >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini <nicola.var...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> Hello, I would like to know if it is possible to overlap CPU and >>>>> GPU with DMDA. >>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>>>> poisson >>>>> >> and ampere equation at the same time? One on CPU and the other on >>>>> GPU? >>>>> >> >>>>> >> Thanks >>>>> >>>> <out_gpu><out_nogpu> >>> >>> >>> >