You can try Hypre. If that fails then there is a problem with your system. And you can run with -info and grep on GAMG and send the output and I can see if I see anything funny.
If this is just a Lapacian with a stable discretization and not crazy material parameters then stretched grids are about the only thing that can hurt the solver. Do both of your solves fail in a similar way? On the CPU you can try this with large subdomains, preferably (in serial ideally): -ampere_mg_levels_ksp_type richardson -ampere_mg_levels_pc_type sor And check that there are no unused options with -options_left. GAMG can fail with bad eigen estimates, but these parameters look fine. On Fri, Aug 14, 2020 at 5:01 AM nicola varini <nicola.var...@gmail.com> wrote: > Dear Barry, yes it gives the same problems. > > Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith <bsm...@petsc.dev> > ha scritto: > >> >> Does the same thing work (with GAMG) if you run on the same problem on >> the same machine same number of MPI ranks but make a new PETSC_ARCH that >> does NOT use the GPUs? >> >> Barry >> >> Ideally one gets almost identical convergence with CPUs or GPUs (same >> problem, same machine) but a bug or numerically change "might" affect this. >> >> On Aug 13, 2020, at 10:28 AM, nicola varini <nicola.var...@gmail.com> >> wrote: >> >> Dear Barry, you are right. The Cray argument checking is incorrect. It >> does work with download-fblaslapack. >> However it does fail to converge. Is there anything obviously wrong with >> my petscrc? >> Anything else am I missing? >> >> Thanks >> >> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith <bsm...@petsc.dev> >> ha scritto: >> >>> >>> The QR is always done on the CPU, we don't have generic calls to >>> blas/lapack go to the GPU currently. >>> >>> The error message is: >>> >>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info >>> = -7) >>> >>> argument 7 is &LWORK which is defined by >>> >>> PetscBLASInt LWORK=N*bs; >>> >>> and >>> >>> N=nSAvec is the column block size of new P. >>> >>> Presumably this is a huge run with many processes so using the >>> debugger is not practical? >>> >>> We need to see what these variables are >>> >>> N, bs, nSAvec >>> >>> perhaps nSAvec is zero which could easily upset LAPACK. >>> >>> Crudest thing would be to just put a print statement in the code >>> before the LAPACK call of if they are called many times add an error check >>> like that >>> generates an error if any of these three values are 0 (or negative). >>> >>> Barry >>> >>> >>> It is not impossible that the Cray argument checking is incorrect >>> and the value passed in is fine. You can check this by using >>> --download-fblaslapack and see if the same or some other error comes up. >>> >>> >>> >>> >>> >>> >>> >>> >>> On Aug 12, 2020, at 7:19 PM, Mark Adams <mfad...@lbl.gov> wrote: >>> >>> Can you reproduce this on the CPU? >>> The QR factorization seems to be failing. That could be from bad data or >>> a bad GPU QR. >>> >>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini <nicola.var...@gmail.com> >>> wrote: >>> >>>> Dear all, following the suggestions I did resubmit the simulation with >>>> the petscrc below. >>>> However I do get the following error: >>>> ======== >>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>> 7364 [339]PETSC ERROR: xGEQRF error >>>> 7365 [339]PETSC ERROR: See >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>> 7367 [339]PETSC ERROR: >>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>>> by nvarini Wed Aug 12 10:06:15 2020 >>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn >>>> --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>> -with-cxx=CC >>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>> (info = -7) >>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>> ======== >>>> >>>> I did try other pc_gamg_type but they fails as well. >>>> >>>> >>>> #PETSc Option Table entries: >>>> -ampere_dm_mat_type aijcusparse >>>> -ampere_dm_vec_type cuda >>>> -ampere_ksp_atol 1e-15 >>>> -ampere_ksp_initial_guess_nonzero yes >>>> -ampere_ksp_reuse_preconditioner yes >>>> -ampere_ksp_rtol 1e-7 >>>> -ampere_ksp_type dgmres >>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>> -ampere_mg_levels_esteig_ksp_type cg >>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>> -ampere_mg_levels_ksp_type chebyshev >>>> -ampere_mg_levels_pc_type jacobi >>>> -ampere_pc_gamg_agg_nsmooths 1 >>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>> -ampere_pc_gamg_reuse_interpolation true >>>> -ampere_pc_gamg_square_graph 1 >>>> -ampere_pc_gamg_threshold 0.05 >>>> -ampere_pc_gamg_threshold_scale .0 >>>> -ampere_pc_gamg_type agg >>>> -ampere_pc_type gamg >>>> -dm_mat_type aijcusparse >>>> -dm_vec_type cuda >>>> -log_view >>>> -poisson_dm_mat_type aijcusparse >>>> -poisson_dm_vec_type cuda >>>> -poisson_ksp_atol 1e-15 >>>> -poisson_ksp_initial_guess_nonzero yes >>>> -poisson_ksp_reuse_preconditioner yes >>>> -poisson_ksp_rtol 1e-7 >>>> -poisson_ksp_type dgmres >>>> -poisson_log_view >>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>> -poisson_mg_levels_esteig_ksp_type cg >>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>> -poisson_mg_levels_ksp_max_it 1 >>>> -poisson_mg_levels_ksp_type chebyshev >>>> -poisson_mg_levels_pc_type jacobi >>>> -poisson_pc_gamg_agg_nsmooths 1 >>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>> -poisson_pc_gamg_reuse_interpolation true >>>> -poisson_pc_gamg_square_graph 1 >>>> -poisson_pc_gamg_threshold 0.05 >>>> -poisson_pc_gamg_threshold_scale .0 >>>> -poisson_pc_gamg_type agg >>>> -poisson_pc_type gamg >>>> -use_mat_nearnullspace true >>>> #End of PETSc Option Table entries >>>> >>>> Regards, >>>> >>>> Nicola >>>> >>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams <mfad...@lbl.gov> >>>> ha scritto: >>>> >>>>> >>>>> >>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>> stefano.zamp...@gmail.com> wrote: >>>>> >>>>>> Nicola, >>>>>> >>>>>> You are actually not using the GPU properly, since you use HYPRE >>>>>> preconditioning, which is CPU only. One of your solvers is actually >>>>>> slower >>>>>> on “GPU”. >>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with >>>>>> Jacobi preconditioning. Mark can help you out with the specific command >>>>>> line options. >>>>>> When it works properly, everything related to PC application is >>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during >>>>>> KSPSolve >>>>>> >>>>>> >>>>> The speedup depends on the machine, but on SUMMIT, using enough CPUs >>>>> to saturate the memory bus vs all 6 GPUs the speedup is a function of >>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>> >>>>> >>>>>> Doing what you want to do is one of the last optimization steps of an >>>>>> already optimized code before entering production. Yours is not even >>>>>> optimized for proper GPU usage yet. >>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>> >>>>>> PETSc has not been designed with multi-threading in mind. You can >>>>>> achieve “overlap” of the two solves by splitting the communicator. But >>>>>> then >>>>>> you need communications to let the two solutions talk to each other. >>>>>> >>>>>> Thanks >>>>>> Stefano >>>>>> >>>>>> >>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini <nicola.var...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Dear all, thanks for your replies. The reason why I've asked if it is >>>>>> possible to overlap poisson and ampere is because they roughly >>>>>> take the same amount of time. Please find in attachment the profiling >>>>>> logs for only CPU and only GPU. >>>>>> Of course it is possible to split the MPI communicator and run each >>>>>> solver on different subcommunicator, however this would involve more >>>>>> communication. >>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>> Thanks >>>>>> >>>>>> >>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams <mfad...@lbl.gov> >>>>>> ha scritto: >>>>>> >>>>>>> I suspect that the Poisson and Ampere's law solve are not coupled. >>>>>>> You might be able to duplicate the communicator and use two threads. You >>>>>>> would want to configure PETSc with threadsafty and threads and I think >>>>>>> it >>>>>>> could/should work, but this mode is never used by anyone. >>>>>>> >>>>>>> That said, I would not recommend doing this unless you feel like >>>>>>> playing in computer science, as opposed to doing application science. >>>>>>> The >>>>>>> best case scenario you get a speedup of 2x. That is a strict upper >>>>>>> bound, >>>>>>> but you will never come close to it. Your hardware has some balance of >>>>>>> CPU >>>>>>> to GPU processing rate. Your application has a balance of volume of work >>>>>>> for your two solves. They have to be the same to get close to 2x speedup >>>>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>>>>> guess about your applications let's assume that the cost of each of >>>>>>> these >>>>>>> two solves is about the same (eg, Laplacians on your domain and the best >>>>>>> case scenario). But, GPU machines are configured to have roughly 1-10% >>>>>>> of >>>>>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware >>>>>>> to >>>>>>> match this problem, and the two solves have the same cost, you will not >>>>>>> see >>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown <j...@jedbrown.org> wrote: >>>>>>> >>>>>>>> You can use MPI and split the communicator so n-1 ranks create a >>>>>>>> DMDA for one part of your system and the other rank drives the GPU in >>>>>>>> the >>>>>>>> other part. They can all be part of the same coupled system on the >>>>>>>> full >>>>>>>> communicator, but PETSc doesn't currently support some ranks having >>>>>>>> their >>>>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>>> benefit you would have gotten). >>>>>>>> >>>>>>>> In any case, be sure to think about the execution time of each >>>>>>>> part. Load balancing with matching time-to-solution for each part can >>>>>>>> be >>>>>>>> really hard. >>>>>>>> >>>>>>>> >>>>>>>> Barry Smith <bsm...@petsc.dev> writes: >>>>>>>> >>>>>>>> > Nicola, >>>>>>>> > >>>>>>>> > This is really viable or practical at this time with PETSc. >>>>>>>> It is not impossible but requires careful coding with threads, another >>>>>>>> possibility is to use one half of the virtual GPUs for each solve, >>>>>>>> this is >>>>>>>> also not trivial. I would recommend first seeing what kind of >>>>>>>> performance >>>>>>>> you can get on the GPU for each type of solve and revist this idea in >>>>>>>> the >>>>>>>> future. >>>>>>>> > >>>>>>>> > Barry >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>> nicola.var...@gmail.com> wrote: >>>>>>>> >> >>>>>>>> >> Hello, I would like to know if it is possible to overlap CPU and >>>>>>>> GPU with DMDA. >>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>>>>>>> poisson >>>>>>>> >> and ampere equation at the same time? One on CPU and the other >>>>>>>> on GPU? >>>>>>>> >> >>>>>>>> >> Thanks >>>>>>>> >>>>>>> <out_gpu><out_nogpu> >>>>>> >>>>>> >>>>>> >>> <out_miniapp_f_poisson> >> >> >>