I thought these were the Lapalcian (Poisson and AMerphere law). Anyway, the coarse grids are very messed up, or at least the eigen estimates are very messed up. A bad QR solver, used in GAMG's coarse grid construction, could do that. I've never seen that happen before, but it would explain this.
On Tue, Aug 18, 2020 at 3:18 AM nicola varini <nicola.var...@gmail.com> wrote: > Dear Mark, the matrices are not symmetric and not positive definite. I did > try to add: > *-*ampere_mg_levels_esteig_*ksp_monitor_singular_value* > -ampere_mg_levels_esteig_ksp_max_it *50* > -ampere_mg_levels_esteig_ksp_type > *gmres* > but it still fails to converge. > For the time being it seems that hypre on CPU is the safest choice, > although it is surely worth experimenting with Stefano branch. > > Thanks, > > Nicola > > Il giorno lun 17 ago 2020 alle ore 18:10 Mark Adams <mfad...@lbl.gov> ha > scritto: > >> The eigen estimates are either very bad or the coarse grids have a >> problem. Everything looks fine other than these bad estimates that are >> >> 2. >> >> * Are these matrices not symmetric? Maybe from BCs. THat is not usually a >> problem, just checking. >> >> * Are these stretched grids? If not you might try: >> -ampere_pc_gamg_square_graph *10* >> >> * GMRES is not a good estimator when you have SPD matrices, but it is >> robust. You might try >> >> *-*ampere_mg_levels_esteig_*ksp_monitor_singular_value* >> -ampere_mg_levels_esteig_ksp_max_it *50* >> -ampere_mg_levels_esteig_ksp_type *gmres* >> >> * And why are you using: >> >> -ampere_ksp_type dgmres >> >> ? >> If your problems are SPD then CG is great. >> >> Mark >> >> On Mon, Aug 17, 2020 at 10:33 AM nicola varini <nicola.var...@gmail.com> >> wrote: >> >>> Hi Mark, this is the out of grep GAMG after I used -info: >>> ======= >>> [0] PCSetUp_GAMG(): level 0) N=582736, n data rows=1, n data cols=1, >>> nnz/row (ave)=9, np=12 >>> [0] PCGAMGFilterGraph(): 97.9676% nnz after filtering, with >>> threshold 0., 8.95768 nnz ave. (N=582736) >>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square >>> [0] PCGAMGProlongator_AGG(): New grid 38934 nodes >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.101683e+00 >>> min=4.341777e-03 >>> PC=jacobi >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 0, cache spectra >>> 0.00434178 2.10168 >>> [0] PCSetUp_GAMG(): 1) N=38934, n data cols=1, nnz/row (ave)=18, 12 >>> active pes >>> [0] PCGAMGFilterGraph(): 97.024% nnz after filtering, with >>> threshold 0., 17.9774 nnz ave. (N=38934) >>> [0] PCGAMGProlongator_AGG(): New grid 4459 nodes >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=4.521607e+01 >>> min=5.854294e-01 PC=jacobi >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 1, cache spectra >>> 0.585429 45.2161 >>> [0] PCSetUp_GAMG(): 2) N=4459, n data cols=1, nnz/row (ave)=29, 12 >>> active pes >>> [0] PCGAMGFilterGraph(): 99.6422% nnz after filtering, with >>> threshold 0., 27.5481 nnz ave. (N=4459) >>> [0] PCGAMGProlongator_AGG(): New grid 345 nodes >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.394069e+01 >>> min=1.086973e-01 >>> PC=jacobi >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 2, cache spectra >>> 0.108697 13.9407 >>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 29 with simple >>> aggregation >>> [0] PCSetUp_GAMG(): 3) N=345, n data cols=1, nnz/row (ave)=31, 6 active >>> pes >>> [0] PCGAMGFilterGraph(): 99.6292% nnz after filtering, with >>> threshold 0., 26.9667 nnz ave. (N=345) >>> [0] PCGAMGProlongator_AGG(): New grid 26 nodes >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.463593e+02 >>> min=1.469384e-01 PC=jacobi >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 3, cache spectra >>> 0.146938 146.359 >>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 5 with simple >>> aggregation >>> [0] PCSetUp_GAMG(): 4) N=26, n data cols=1, nnz/row (ave)=16, 1 active >>> pes >>> [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.16304 >>> PCGAMGGraph_AGG 4 1.0 8.4114e-02 1.0 1.02e+06 1.0 3.8e+02 1.3e+03 >>> 4.0e+01 0 0 0 0 0 0 0 0 0 0 145 >>> PCGAMGCoarse_AGG 4 1.0 3.2107e-01 1.0 9.43e+06 1.0 7.3e+02 1.1e+04 >>> 3.5e+01 0 0 0 0 0 0 0 0 0 0 351 >>> PCGAMGProl_AGG 4 1.0 2.8825e-02 1.0 0.00e+00 0.0 3.5e+02 2.8e+03 >>> 6.4e+01 0 0 0 0 0 0 0 0 0 0 0 >>> PCGAMGPOpt_AGG 4 1.0 1.1570e-01 1.0 2.61e+07 1.0 1.2e+03 2.6e+03 >>> 1.6e+02 0 0 0 0 1 0 0 0 0 1 2692 >>> GAMG: createProl 4 1.0 5.5680e-01 1.0 3.64e+07 1.0 2.7e+03 4.6e+03 >>> 3.0e+02 0 0 0 0 1 0 0 0 0 1 784 >>> GAMG: partLevel 4 1.0 1.1628e-01 1.0 5.90e+06 1.0 1.1e+03 3.0e+03 >>> 1.6e+02 0 0 0 0 1 0 0 0 0 1 604 >>> ====== >>> Nicola >>> >>> >>> Il giorno lun 17 ago 2020 alle ore 15:40 Mark Adams <mfad...@lbl.gov> >>> ha scritto: >>> >>>> >>>> >>>> On Mon, Aug 17, 2020 at 9:24 AM nicola varini <nicola.var...@gmail.com> >>>> wrote: >>>> >>>>> Hi Mark, I do confirm that hypre with boomeramg is working fine and is >>>>> pretty fast. >>>>> >>>> >>>> Good, you can send me the -info (grep GAMG) output and I try to see >>>> what is going on. >>>> >>>> >>>>> However, none of the GAMG option works. >>>>> Did anyone ever succeeded in usign hypre with petsc on gpu? >>>>> >>>> >>>> We have gotten Hypre to run on GPUs but it has been fragile. The >>>> performance has been marginal (due to use of USM apparently), but it is >>>> being worked on by the hypre team. >>>> >>>> The cude tools are changing fast and I am guessing this is a different >>>> version than what we have tested, perhaps. Maybe someone else can help with >>>> this, but I know we use cuda 10.2 and you are using cuda tools 10.1. >>>> >>>> And you do want to use the most up-to-date PETSc. >>>> >>>> >>>>> I did manage to compile hypre on gpu but I do get the following error: >>>>> ======= >>>>> CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o >>>>> In file included from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, >>>>> from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>>>> from >>>>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>>>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>>>> from >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>>>> from >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: >>>>> error: unknown type name 'namespace' >>>>> namespace thrust >>>>> ^~~~~~~~~ >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: >>>>> error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token >>>>> { >>>>> ^ >>>>> In file included from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, >>>>> from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, >>>>> from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>>>> from >>>>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>>>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>>>> from >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>>>> from >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: >>>>> fatal error: cstddef: No such file or directory >>>>> #include <cstddef> >>>>> ^~~~~~~~~ >>>>> compilation terminated. >>>>> >>>>> ======= >>>>> Nicola >>>>> >>>>> Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams <mfad...@lbl.gov> >>>>> ha scritto: >>>>> >>>>>> You can try Hypre. If that fails then there is a problem with your >>>>>> system. >>>>>> >>>>>> And you can run with -info and grep on GAMG and send the output and I >>>>>> can see if I see anything funny. >>>>>> >>>>>> If this is just a Lapacian with a stable discretization and not crazy >>>>>> material parameters then stretched grids are about the only thing that >>>>>> can >>>>>> hurt the solver. >>>>>> >>>>>> Do both of your solves fail in a similar way? >>>>>> >>>>>> On the CPU you can try this with large subdomains, preferably (in >>>>>> serial ideally): >>>>>> -ampere_mg_levels_ksp_type richardson >>>>>> -ampere_mg_levels_pc_type sor >>>>>> >>>>>> And check that there are no unused options with -options_left. GAMG >>>>>> can fail with bad eigen estimates, but these parameters look fine. >>>>>> >>>>>> On Fri, Aug 14, 2020 at 5:01 AM nicola varini < >>>>>> nicola.var...@gmail.com> wrote: >>>>>> >>>>>>> Dear Barry, yes it gives the same problems. >>>>>>> >>>>>>> Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith < >>>>>>> bsm...@petsc.dev> ha scritto: >>>>>>> >>>>>>>> >>>>>>>> Does the same thing work (with GAMG) if you run on the same >>>>>>>> problem on the same machine same number of MPI ranks but make a new >>>>>>>> PETSC_ARCH that does NOT use the GPUs? >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> Ideally one gets almost identical convergence with CPUs or GPUs >>>>>>>> (same problem, same machine) but a bug or numerically change "might" >>>>>>>> affect >>>>>>>> this. >>>>>>>> >>>>>>>> On Aug 13, 2020, at 10:28 AM, nicola varini < >>>>>>>> nicola.var...@gmail.com> wrote: >>>>>>>> >>>>>>>> Dear Barry, you are right. The Cray argument checking is incorrect. >>>>>>>> It does work with download-fblaslapack. >>>>>>>> However it does fail to converge. Is there anything obviously wrong >>>>>>>> with my petscrc? >>>>>>>> Anything else am I missing? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith < >>>>>>>> bsm...@petsc.dev> ha scritto: >>>>>>>> >>>>>>>>> >>>>>>>>> The QR is always done on the CPU, we don't have generic calls >>>>>>>>> to blas/lapack go to the GPU currently. >>>>>>>>> >>>>>>>>> The error message is: >>>>>>>>> >>>>>>>>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>>>>>>> (info = -7) >>>>>>>>> >>>>>>>>> argument 7 is &LWORK which is defined by >>>>>>>>> >>>>>>>>> PetscBLASInt LWORK=N*bs; >>>>>>>>> >>>>>>>>> and >>>>>>>>> >>>>>>>>> N=nSAvec is the column block size of new P. >>>>>>>>> >>>>>>>>> Presumably this is a huge run with many processes so using the >>>>>>>>> debugger is not practical? >>>>>>>>> >>>>>>>>> We need to see what these variables are >>>>>>>>> >>>>>>>>> N, bs, nSAvec >>>>>>>>> >>>>>>>>> perhaps nSAvec is zero which could easily upset LAPACK. >>>>>>>>> >>>>>>>>> Crudest thing would be to just put a print statement in the >>>>>>>>> code before the LAPACK call of if they are called many times add an >>>>>>>>> error >>>>>>>>> check like that >>>>>>>>> generates an error if any of these three values are 0 (or >>>>>>>>> negative). >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> It is not impossible that the Cray argument checking is >>>>>>>>> incorrect and the value passed in is fine. You can check this by using >>>>>>>>> --download-fblaslapack and see if the same or some other error comes >>>>>>>>> up. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Aug 12, 2020, at 7:19 PM, Mark Adams <mfad...@lbl.gov> wrote: >>>>>>>>> >>>>>>>>> Can you reproduce this on the CPU? >>>>>>>>> The QR factorization seems to be failing. That could be from bad >>>>>>>>> data or a bad GPU QR. >>>>>>>>> >>>>>>>>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini < >>>>>>>>> nicola.var...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Dear all, following the suggestions I did resubmit the simulation >>>>>>>>>> with the petscrc below. >>>>>>>>>> However I do get the following error: >>>>>>>>>> ======== >>>>>>>>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>>>>>>>> 7364 [339]PETSC ERROR: xGEQRF error >>>>>>>>>> 7365 [339]PETSC ERROR: See >>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>>>> shooting. >>>>>>>>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, >>>>>>>>>> 2020 >>>>>>>>>> 7367 [339]PETSC ERROR: >>>>>>>>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named >>>>>>>>>> nid05083 >>>>>>>>>> by nvarini Wed Aug 12 10:06:15 2020 >>>>>>>>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc >>>>>>>>>> --with-fc=ftn --known-mpi-shared-libraries=1 >>>>>>>>>> --known-mpi-c-double-complex=1 >>>>>>>>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>>>>>>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ >>>>>>>>>> --with-cxxlib-autodetect=0 >>>>>>>>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>>>>>>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>>>>>>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>>>>>>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>>>>>>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>>>>>>>> -with-cxx=CC >>>>>>>>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>>>>>>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal >>>>>>>>>> value (info = -7) >>>>>>>>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> ======== >>>>>>>>>> >>>>>>>>>> I did try other pc_gamg_type but they fails as well. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> #PETSc Option Table entries: >>>>>>>>>> -ampere_dm_mat_type aijcusparse >>>>>>>>>> -ampere_dm_vec_type cuda >>>>>>>>>> -ampere_ksp_atol 1e-15 >>>>>>>>>> -ampere_ksp_initial_guess_nonzero yes >>>>>>>>>> -ampere_ksp_reuse_preconditioner yes >>>>>>>>>> -ampere_ksp_rtol 1e-7 >>>>>>>>>> -ampere_ksp_type dgmres >>>>>>>>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>>>>>>>> -ampere_mg_levels_esteig_ksp_type cg >>>>>>>>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>> -ampere_mg_levels_ksp_type chebyshev >>>>>>>>>> -ampere_mg_levels_pc_type jacobi >>>>>>>>>> -ampere_pc_gamg_agg_nsmooths 1 >>>>>>>>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>>>>>>>> -ampere_pc_gamg_reuse_interpolation true >>>>>>>>>> -ampere_pc_gamg_square_graph 1 >>>>>>>>>> -ampere_pc_gamg_threshold 0.05 >>>>>>>>>> -ampere_pc_gamg_threshold_scale .0 >>>>>>>>>> -ampere_pc_gamg_type agg >>>>>>>>>> -ampere_pc_type gamg >>>>>>>>>> -dm_mat_type aijcusparse >>>>>>>>>> -dm_vec_type cuda >>>>>>>>>> -log_view >>>>>>>>>> -poisson_dm_mat_type aijcusparse >>>>>>>>>> -poisson_dm_vec_type cuda >>>>>>>>>> -poisson_ksp_atol 1e-15 >>>>>>>>>> -poisson_ksp_initial_guess_nonzero yes >>>>>>>>>> -poisson_ksp_reuse_preconditioner yes >>>>>>>>>> -poisson_ksp_rtol 1e-7 >>>>>>>>>> -poisson_ksp_type dgmres >>>>>>>>>> -poisson_log_view >>>>>>>>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>>>>>>>> -poisson_mg_levels_esteig_ksp_type cg >>>>>>>>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>> -poisson_mg_levels_ksp_max_it 1 >>>>>>>>>> -poisson_mg_levels_ksp_type chebyshev >>>>>>>>>> -poisson_mg_levels_pc_type jacobi >>>>>>>>>> -poisson_pc_gamg_agg_nsmooths 1 >>>>>>>>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>>>>>>>> -poisson_pc_gamg_reuse_interpolation true >>>>>>>>>> -poisson_pc_gamg_square_graph 1 >>>>>>>>>> -poisson_pc_gamg_threshold 0.05 >>>>>>>>>> -poisson_pc_gamg_threshold_scale .0 >>>>>>>>>> -poisson_pc_gamg_type agg >>>>>>>>>> -poisson_pc_type gamg >>>>>>>>>> -use_mat_nearnullspace true >>>>>>>>>> #End of PETSc Option Table entries >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Nicola >>>>>>>>>> >>>>>>>>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams < >>>>>>>>>> mfad...@lbl.gov> ha scritto: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>>>>>>>> stefano.zamp...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Nicola, >>>>>>>>>>>> >>>>>>>>>>>> You are actually not using the GPU properly, since you use >>>>>>>>>>>> HYPRE preconditioning, which is CPU only. One of your solvers is >>>>>>>>>>>> actually >>>>>>>>>>>> slower on “GPU”. >>>>>>>>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers >>>>>>>>>>>> and with Jacobi preconditioning. Mark can help you out with the >>>>>>>>>>>> specific >>>>>>>>>>>> command line options. >>>>>>>>>>>> When it works properly, everything related to PC application is >>>>>>>>>>>> offloaded to the GPU, and you should expect to get the well-known >>>>>>>>>>>> and >>>>>>>>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during >>>>>>>>>>>> KSPSolve >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> The speedup depends on the machine, but on SUMMIT, using enough >>>>>>>>>>> CPUs to saturate the memory bus vs all 6 GPUs the speedup is a >>>>>>>>>>> function of >>>>>>>>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Doing what you want to do is one of the last optimization steps >>>>>>>>>>>> of an already optimized code before entering production. Yours is >>>>>>>>>>>> not even >>>>>>>>>>>> optimized for proper GPU usage yet. >>>>>>>>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>>>>>>>> >>>>>>>>>>>> PETSc has not been designed with multi-threading in mind. You >>>>>>>>>>>> can achieve “overlap” of the two solves by splitting the >>>>>>>>>>>> communicator. But >>>>>>>>>>>> then you need communications to let the two solutions talk to each >>>>>>>>>>>> other. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Stefano >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini < >>>>>>>>>>>> nicola.var...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Dear all, thanks for your replies. The reason why I've asked if >>>>>>>>>>>> it is possible to overlap poisson and ampere is because they >>>>>>>>>>>> roughly >>>>>>>>>>>> take the same amount of time. Please find in attachment the >>>>>>>>>>>> profiling logs for only CPU and only GPU. >>>>>>>>>>>> Of course it is possible to split the MPI communicator and run >>>>>>>>>>>> each solver on different subcommunicator, however this would >>>>>>>>>>>> involve more >>>>>>>>>>>> communication. >>>>>>>>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams < >>>>>>>>>>>> mfad...@lbl.gov> ha scritto: >>>>>>>>>>>> >>>>>>>>>>>>> I suspect that the Poisson and Ampere's law solve are not >>>>>>>>>>>>> coupled. You might be able to duplicate the communicator and use >>>>>>>>>>>>> two >>>>>>>>>>>>> threads. You would want to configure PETSc with threadsafty and >>>>>>>>>>>>> threads and >>>>>>>>>>>>> I think it could/should work, but this mode is never used by >>>>>>>>>>>>> anyone. >>>>>>>>>>>>> >>>>>>>>>>>>> That said, I would not recommend doing this unless you feel >>>>>>>>>>>>> like playing in computer science, as opposed to doing application >>>>>>>>>>>>> science. >>>>>>>>>>>>> The best case scenario you get a speedup of 2x. That is a strict >>>>>>>>>>>>> upper >>>>>>>>>>>>> bound, but you will never come close to it. Your hardware has >>>>>>>>>>>>> some balance >>>>>>>>>>>>> of CPU to GPU processing rate. Your application has a balance of >>>>>>>>>>>>> volume of >>>>>>>>>>>>> work for your two solves. They have to be the same to get close >>>>>>>>>>>>> to 2x >>>>>>>>>>>>> speedup and that ratio(s) has to be 1:1. To be concrete, from >>>>>>>>>>>>> what little I >>>>>>>>>>>>> can guess about your applications let's assume that the cost of >>>>>>>>>>>>> each of >>>>>>>>>>>>> these two solves is about the same (eg, Laplacians on your domain >>>>>>>>>>>>> and the >>>>>>>>>>>>> best case scenario). But, GPU machines are configured to have >>>>>>>>>>>>> roughly 1-10% >>>>>>>>>>>>> of capacity in the GPUs, these days, that gives you an upper >>>>>>>>>>>>> bound of about >>>>>>>>>>>>> 10% speedup. That is noise. Upshot, unless you configure your >>>>>>>>>>>>> hardware to >>>>>>>>>>>>> match this problem, and the two solves have the same cost, you >>>>>>>>>>>>> will not see >>>>>>>>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>>>>>>>> >>>>>>>>>>>>> Mark >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown <j...@jedbrown.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> You can use MPI and split the communicator so n-1 ranks >>>>>>>>>>>>>> create a DMDA for one part of your system and the other rank >>>>>>>>>>>>>> drives the GPU >>>>>>>>>>>>>> in the other part. They can all be part of the same coupled >>>>>>>>>>>>>> system on the >>>>>>>>>>>>>> full communicator, but PETSc doesn't currently support some >>>>>>>>>>>>>> ranks having >>>>>>>>>>>>>> their Vec arrays on GPU and others on host, so you'd be paying >>>>>>>>>>>>>> host-device >>>>>>>>>>>>>> transfer costs on each iteration (and that might swamp any >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> benefit you would have gotten). >>>>>>>>>>>>>> >>>>>>>>>>>>>> In any case, be sure to think about the execution time of >>>>>>>>>>>>>> each part. Load balancing with matching time-to-solution for >>>>>>>>>>>>>> each part can >>>>>>>>>>>>>> be really hard. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry Smith <bsm...@petsc.dev> writes: >>>>>>>>>>>>>> >>>>>>>>>>>>>> > Nicola, >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > This is really viable or practical at this time with >>>>>>>>>>>>>> PETSc. It is not impossible but requires careful coding with >>>>>>>>>>>>>> threads, >>>>>>>>>>>>>> another possibility is to use one half of the virtual GPUs for >>>>>>>>>>>>>> each solve, >>>>>>>>>>>>>> this is also not trivial. I would recommend first seeing what >>>>>>>>>>>>>> kind of >>>>>>>>>>>>>> performance you can get on the GPU for each type of solve and >>>>>>>>>>>>>> revist this >>>>>>>>>>>>>> idea in the future. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Barry >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>>>>>>>> nicola.var...@gmail.com> wrote: >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Hello, I would like to know if it is possible to overlap >>>>>>>>>>>>>> CPU and GPU with DMDA. >>>>>>>>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>>>>>>>> >> I've to resolve Poisson and Ampere equation for each time >>>>>>>>>>>>>> step. >>>>>>>>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to >>>>>>>>>>>>>> compute poisson >>>>>>>>>>>>>> >> and ampere equation at the same time? One on CPU and the >>>>>>>>>>>>>> other on GPU? >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Thanks >>>>>>>>>>>>>> >>>>>>>>>>>>> <out_gpu><out_nogpu> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> <out_miniapp_f_poisson> >>>>>>>> >>>>>>>> >>>>>>>>