The eigen estimates are either very bad or the coarse grids have a problem. Everything looks fine other than these bad estimates that are >> 2.
* Are these matrices not symmetric? Maybe from BCs. THat is not usually a problem, just checking. * Are these stretched grids? If not you might try: -ampere_pc_gamg_square_graph *10* * GMRES is not a good estimator when you have SPD matrices, but it is robust. You might try *-*ampere_mg_levels_esteig_*ksp_monitor_singular_value* -ampere_mg_levels_esteig_ksp_max_it *50* -ampere_mg_levels_esteig_ksp_type *gmres* * And why are you using: -ampere_ksp_type dgmres ? If your problems are SPD then CG is great. Mark On Mon, Aug 17, 2020 at 10:33 AM nicola varini <nicola.var...@gmail.com> wrote: > Hi Mark, this is the out of grep GAMG after I used -info: > ======= > [0] PCSetUp_GAMG(): level 0) N=582736, n data rows=1, n data cols=1, > nnz/row (ave)=9, np=12 > [0] PCGAMGFilterGraph(): 97.9676% nnz after filtering, with > threshold 0., 8.95768 nnz ave. (N=582736) > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square > [0] PCGAMGProlongator_AGG(): New grid 38934 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.101683e+00 > min=4.341777e-03 > PC=jacobi > [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 0, cache spectra > 0.00434178 2.10168 > [0] PCSetUp_GAMG(): 1) N=38934, n data cols=1, nnz/row (ave)=18, 12 active > pes > [0] PCGAMGFilterGraph(): 97.024% nnz after filtering, with > threshold 0., 17.9774 nnz ave. (N=38934) > [0] PCGAMGProlongator_AGG(): New grid 4459 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=4.521607e+01 > min=5.854294e-01 PC=jacobi > [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 1, cache spectra 0.585429 > 45.2161 > [0] PCSetUp_GAMG(): 2) N=4459, n data cols=1, nnz/row (ave)=29, 12 active > pes > [0] PCGAMGFilterGraph(): 99.6422% nnz after filtering, with > threshold 0., 27.5481 nnz ave. (N=4459) > [0] PCGAMGProlongator_AGG(): New grid 345 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.394069e+01 > min=1.086973e-01 > PC=jacobi > [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 2, cache spectra 0.108697 > 13.9407 > [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 29 with simple > aggregation > [0] PCSetUp_GAMG(): 3) N=345, n data cols=1, nnz/row (ave)=31, 6 active pes > [0] PCGAMGFilterGraph(): 99.6292% nnz after filtering, with > threshold 0., 26.9667 nnz ave. (N=345) > [0] PCGAMGProlongator_AGG(): New grid 26 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.463593e+02 > min=1.469384e-01 PC=jacobi > [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 3, cache spectra 0.146938 > 146.359 > [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 5 with simple > aggregation > [0] PCSetUp_GAMG(): 4) N=26, n data cols=1, nnz/row (ave)=16, 1 active pes > [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.16304 > PCGAMGGraph_AGG 4 1.0 8.4114e-02 1.0 1.02e+06 1.0 3.8e+02 1.3e+03 > 4.0e+01 0 0 0 0 0 0 0 0 0 0 145 > PCGAMGCoarse_AGG 4 1.0 3.2107e-01 1.0 9.43e+06 1.0 7.3e+02 1.1e+04 > 3.5e+01 0 0 0 0 0 0 0 0 0 0 351 > PCGAMGProl_AGG 4 1.0 2.8825e-02 1.0 0.00e+00 0.0 3.5e+02 2.8e+03 > 6.4e+01 0 0 0 0 0 0 0 0 0 0 0 > PCGAMGPOpt_AGG 4 1.0 1.1570e-01 1.0 2.61e+07 1.0 1.2e+03 2.6e+03 > 1.6e+02 0 0 0 0 1 0 0 0 0 1 2692 > GAMG: createProl 4 1.0 5.5680e-01 1.0 3.64e+07 1.0 2.7e+03 4.6e+03 > 3.0e+02 0 0 0 0 1 0 0 0 0 1 784 > GAMG: partLevel 4 1.0 1.1628e-01 1.0 5.90e+06 1.0 1.1e+03 3.0e+03 > 1.6e+02 0 0 0 0 1 0 0 0 0 1 604 > ====== > Nicola > > > Il giorno lun 17 ago 2020 alle ore 15:40 Mark Adams <mfad...@lbl.gov> ha > scritto: > >> >> >> On Mon, Aug 17, 2020 at 9:24 AM nicola varini <nicola.var...@gmail.com> >> wrote: >> >>> Hi Mark, I do confirm that hypre with boomeramg is working fine and is >>> pretty fast. >>> >> >> Good, you can send me the -info (grep GAMG) output and I try to see what >> is going on. >> >> >>> However, none of the GAMG option works. >>> Did anyone ever succeeded in usign hypre with petsc on gpu? >>> >> >> We have gotten Hypre to run on GPUs but it has been fragile. The >> performance has been marginal (due to use of USM apparently), but it is >> being worked on by the hypre team. >> >> The cude tools are changing fast and I am guessing this is a different >> version than what we have tested, perhaps. Maybe someone else can help with >> this, but I know we use cuda 10.2 and you are using cuda tools 10.1. >> >> And you do want to use the most up-to-date PETSc. >> >> >>> I did manage to compile hypre on gpu but I do get the following error: >>> ======= >>> CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o >>> In file included from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, >>> from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>> from >>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>> from >>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>> from >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: >>> error: unknown type name 'namespace' >>> namespace thrust >>> ^~~~~~~~~ >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: >>> error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token >>> { >>> ^ >>> In file included from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, >>> from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, >>> from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>> from >>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>> from >>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>> from >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: >>> fatal error: cstddef: No such file or directory >>> #include <cstddef> >>> ^~~~~~~~~ >>> compilation terminated. >>> >>> ======= >>> Nicola >>> >>> Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams <mfad...@lbl.gov> >>> ha scritto: >>> >>>> You can try Hypre. If that fails then there is a problem with your >>>> system. >>>> >>>> And you can run with -info and grep on GAMG and send the output and I >>>> can see if I see anything funny. >>>> >>>> If this is just a Lapacian with a stable discretization and not crazy >>>> material parameters then stretched grids are about the only thing that can >>>> hurt the solver. >>>> >>>> Do both of your solves fail in a similar way? >>>> >>>> On the CPU you can try this with large subdomains, preferably (in >>>> serial ideally): >>>> -ampere_mg_levels_ksp_type richardson >>>> -ampere_mg_levels_pc_type sor >>>> >>>> And check that there are no unused options with -options_left. GAMG can >>>> fail with bad eigen estimates, but these parameters look fine. >>>> >>>> On Fri, Aug 14, 2020 at 5:01 AM nicola varini <nicola.var...@gmail.com> >>>> wrote: >>>> >>>>> Dear Barry, yes it gives the same problems. >>>>> >>>>> Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith <bsm...@petsc.dev> >>>>> ha scritto: >>>>> >>>>>> >>>>>> Does the same thing work (with GAMG) if you run on the same >>>>>> problem on the same machine same number of MPI ranks but make a new >>>>>> PETSC_ARCH that does NOT use the GPUs? >>>>>> >>>>>> Barry >>>>>> >>>>>> Ideally one gets almost identical convergence with CPUs or GPUs >>>>>> (same problem, same machine) but a bug or numerically change "might" >>>>>> affect >>>>>> this. >>>>>> >>>>>> On Aug 13, 2020, at 10:28 AM, nicola varini <nicola.var...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Dear Barry, you are right. The Cray argument checking is incorrect. >>>>>> It does work with download-fblaslapack. >>>>>> However it does fail to converge. Is there anything obviously wrong >>>>>> with my petscrc? >>>>>> Anything else am I missing? >>>>>> >>>>>> Thanks >>>>>> >>>>>> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith < >>>>>> bsm...@petsc.dev> ha scritto: >>>>>> >>>>>>> >>>>>>> The QR is always done on the CPU, we don't have generic calls to >>>>>>> blas/lapack go to the GPU currently. >>>>>>> >>>>>>> The error message is: >>>>>>> >>>>>>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>>>>> (info = -7) >>>>>>> >>>>>>> argument 7 is &LWORK which is defined by >>>>>>> >>>>>>> PetscBLASInt LWORK=N*bs; >>>>>>> >>>>>>> and >>>>>>> >>>>>>> N=nSAvec is the column block size of new P. >>>>>>> >>>>>>> Presumably this is a huge run with many processes so using the >>>>>>> debugger is not practical? >>>>>>> >>>>>>> We need to see what these variables are >>>>>>> >>>>>>> N, bs, nSAvec >>>>>>> >>>>>>> perhaps nSAvec is zero which could easily upset LAPACK. >>>>>>> >>>>>>> Crudest thing would be to just put a print statement in the code >>>>>>> before the LAPACK call of if they are called many times add an error >>>>>>> check >>>>>>> like that >>>>>>> generates an error if any of these three values are 0 (or >>>>>>> negative). >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> It is not impossible that the Cray argument checking is >>>>>>> incorrect and the value passed in is fine. You can check this by using >>>>>>> --download-fblaslapack and see if the same or some other error comes up. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Aug 12, 2020, at 7:19 PM, Mark Adams <mfad...@lbl.gov> wrote: >>>>>>> >>>>>>> Can you reproduce this on the CPU? >>>>>>> The QR factorization seems to be failing. That could be from bad >>>>>>> data or a bad GPU QR. >>>>>>> >>>>>>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini < >>>>>>> nicola.var...@gmail.com> wrote: >>>>>>> >>>>>>>> Dear all, following the suggestions I did resubmit the simulation >>>>>>>> with the petscrc below. >>>>>>>> However I do get the following error: >>>>>>>> ======== >>>>>>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>>>>>> 7364 [339]PETSC ERROR: xGEQRF error >>>>>>>> 7365 [339]PETSC ERROR: See >>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>> shooting. >>>>>>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>>>>>> 7367 [339]PETSC ERROR: >>>>>>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named >>>>>>>> nid05083 >>>>>>>> by nvarini Wed Aug 12 10:06:15 2020 >>>>>>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc >>>>>>>> --with-fc=ftn --known-mpi-shared-libraries=1 >>>>>>>> --known-mpi-c-double-complex=1 >>>>>>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>>>>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ >>>>>>>> --with-cxxlib-autodetect=0 >>>>>>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>>>>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>>>>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>>>>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>>>>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>>>>>> -with-cxx=CC >>>>>>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>>>>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal >>>>>>>> value (info = -7) >>>>>>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> ======== >>>>>>>> >>>>>>>> I did try other pc_gamg_type but they fails as well. >>>>>>>> >>>>>>>> >>>>>>>> #PETSc Option Table entries: >>>>>>>> -ampere_dm_mat_type aijcusparse >>>>>>>> -ampere_dm_vec_type cuda >>>>>>>> -ampere_ksp_atol 1e-15 >>>>>>>> -ampere_ksp_initial_guess_nonzero yes >>>>>>>> -ampere_ksp_reuse_preconditioner yes >>>>>>>> -ampere_ksp_rtol 1e-7 >>>>>>>> -ampere_ksp_type dgmres >>>>>>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>>>>>> -ampere_mg_levels_esteig_ksp_type cg >>>>>>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>> -ampere_mg_levels_ksp_type chebyshev >>>>>>>> -ampere_mg_levels_pc_type jacobi >>>>>>>> -ampere_pc_gamg_agg_nsmooths 1 >>>>>>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>>>>>> -ampere_pc_gamg_reuse_interpolation true >>>>>>>> -ampere_pc_gamg_square_graph 1 >>>>>>>> -ampere_pc_gamg_threshold 0.05 >>>>>>>> -ampere_pc_gamg_threshold_scale .0 >>>>>>>> -ampere_pc_gamg_type agg >>>>>>>> -ampere_pc_type gamg >>>>>>>> -dm_mat_type aijcusparse >>>>>>>> -dm_vec_type cuda >>>>>>>> -log_view >>>>>>>> -poisson_dm_mat_type aijcusparse >>>>>>>> -poisson_dm_vec_type cuda >>>>>>>> -poisson_ksp_atol 1e-15 >>>>>>>> -poisson_ksp_initial_guess_nonzero yes >>>>>>>> -poisson_ksp_reuse_preconditioner yes >>>>>>>> -poisson_ksp_rtol 1e-7 >>>>>>>> -poisson_ksp_type dgmres >>>>>>>> -poisson_log_view >>>>>>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>>>>>> -poisson_mg_levels_esteig_ksp_type cg >>>>>>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>> -poisson_mg_levels_ksp_max_it 1 >>>>>>>> -poisson_mg_levels_ksp_type chebyshev >>>>>>>> -poisson_mg_levels_pc_type jacobi >>>>>>>> -poisson_pc_gamg_agg_nsmooths 1 >>>>>>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>>>>>> -poisson_pc_gamg_reuse_interpolation true >>>>>>>> -poisson_pc_gamg_square_graph 1 >>>>>>>> -poisson_pc_gamg_threshold 0.05 >>>>>>>> -poisson_pc_gamg_threshold_scale .0 >>>>>>>> -poisson_pc_gamg_type agg >>>>>>>> -poisson_pc_type gamg >>>>>>>> -use_mat_nearnullspace true >>>>>>>> #End of PETSc Option Table entries >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Nicola >>>>>>>> >>>>>>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams <mfad...@lbl.gov> >>>>>>>> ha scritto: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>>>>>> stefano.zamp...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Nicola, >>>>>>>>>> >>>>>>>>>> You are actually not using the GPU properly, since you use HYPRE >>>>>>>>>> preconditioning, which is CPU only. One of your solvers is actually >>>>>>>>>> slower >>>>>>>>>> on “GPU”. >>>>>>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and >>>>>>>>>> with Jacobi preconditioning. Mark can help you out with the specific >>>>>>>>>> command line options. >>>>>>>>>> When it works properly, everything related to PC application is >>>>>>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during >>>>>>>>>> KSPSolve >>>>>>>>>> >>>>>>>>>> >>>>>>>>> The speedup depends on the machine, but on SUMMIT, using enough >>>>>>>>> CPUs to saturate the memory bus vs all 6 GPUs the speedup is a >>>>>>>>> function of >>>>>>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Doing what you want to do is one of the last optimization steps >>>>>>>>>> of an already optimized code before entering production. Yours is >>>>>>>>>> not even >>>>>>>>>> optimized for proper GPU usage yet. >>>>>>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>>>>>> >>>>>>>>>> PETSc has not been designed with multi-threading in mind. You can >>>>>>>>>> achieve “overlap” of the two solves by splitting the communicator. >>>>>>>>>> But then >>>>>>>>>> you need communications to let the two solutions talk to each other. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Stefano >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini < >>>>>>>>>> nicola.var...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> Dear all, thanks for your replies. The reason why I've asked if >>>>>>>>>> it is possible to overlap poisson and ampere is because they roughly >>>>>>>>>> take the same amount of time. Please find in attachment the >>>>>>>>>> profiling logs for only CPU and only GPU. >>>>>>>>>> Of course it is possible to split the MPI communicator and run >>>>>>>>>> each solver on different subcommunicator, however this would involve >>>>>>>>>> more >>>>>>>>>> communication. >>>>>>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams < >>>>>>>>>> mfad...@lbl.gov> ha scritto: >>>>>>>>>> >>>>>>>>>>> I suspect that the Poisson and Ampere's law solve are not >>>>>>>>>>> coupled. You might be able to duplicate the communicator and use two >>>>>>>>>>> threads. You would want to configure PETSc with threadsafty and >>>>>>>>>>> threads and >>>>>>>>>>> I think it could/should work, but this mode is never used by anyone. >>>>>>>>>>> >>>>>>>>>>> That said, I would not recommend doing this unless you feel like >>>>>>>>>>> playing in computer science, as opposed to doing application >>>>>>>>>>> science. The >>>>>>>>>>> best case scenario you get a speedup of 2x. That is a strict upper >>>>>>>>>>> bound, >>>>>>>>>>> but you will never come close to it. Your hardware has some balance >>>>>>>>>>> of CPU >>>>>>>>>>> to GPU processing rate. Your application has a balance of volume of >>>>>>>>>>> work >>>>>>>>>>> for your two solves. They have to be the same to get close to 2x >>>>>>>>>>> speedup >>>>>>>>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I >>>>>>>>>>> can >>>>>>>>>>> guess about your applications let's assume that the cost of each of >>>>>>>>>>> these >>>>>>>>>>> two solves is about the same (eg, Laplacians on your domain and the >>>>>>>>>>> best >>>>>>>>>>> case scenario). But, GPU machines are configured to have roughly >>>>>>>>>>> 1-10% of >>>>>>>>>>> capacity in the GPUs, these days, that gives you an upper bound of >>>>>>>>>>> about >>>>>>>>>>> 10% speedup. That is noise. Upshot, unless you configure your >>>>>>>>>>> hardware to >>>>>>>>>>> match this problem, and the two solves have the same cost, you will >>>>>>>>>>> not see >>>>>>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>>>>>> >>>>>>>>>>> Mark >>>>>>>>>>> >>>>>>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown <j...@jedbrown.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> You can use MPI and split the communicator so n-1 ranks create >>>>>>>>>>>> a DMDA for one part of your system and the other rank drives the >>>>>>>>>>>> GPU in the >>>>>>>>>>>> other part. They can all be part of the same coupled system on >>>>>>>>>>>> the full >>>>>>>>>>>> communicator, but PETSc doesn't currently support some ranks >>>>>>>>>>>> having their >>>>>>>>>>>> Vec arrays on GPU and others on host, so you'd be paying >>>>>>>>>>>> host-device >>>>>>>>>>>> transfer costs on each iteration (and that might swamp any >>>>>>>>>>>> performance >>>>>>>>>>>> benefit you would have gotten). >>>>>>>>>>>> >>>>>>>>>>>> In any case, be sure to think about the execution time of each >>>>>>>>>>>> part. Load balancing with matching time-to-solution for each part >>>>>>>>>>>> can be >>>>>>>>>>>> really hard. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Barry Smith <bsm...@petsc.dev> writes: >>>>>>>>>>>> >>>>>>>>>>>> > Nicola, >>>>>>>>>>>> > >>>>>>>>>>>> > This is really viable or practical at this time with >>>>>>>>>>>> PETSc. It is not impossible but requires careful coding with >>>>>>>>>>>> threads, >>>>>>>>>>>> another possibility is to use one half of the virtual GPUs for >>>>>>>>>>>> each solve, >>>>>>>>>>>> this is also not trivial. I would recommend first seeing what kind >>>>>>>>>>>> of >>>>>>>>>>>> performance you can get on the GPU for each type of solve and >>>>>>>>>>>> revist this >>>>>>>>>>>> idea in the future. >>>>>>>>>>>> > >>>>>>>>>>>> > Barry >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>>>>>> nicola.var...@gmail.com> wrote: >>>>>>>>>>>> >> >>>>>>>>>>>> >> Hello, I would like to know if it is possible to overlap CPU >>>>>>>>>>>> and GPU with DMDA. >>>>>>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>>>>>> >> I've to resolve Poisson and Ampere equation for each time >>>>>>>>>>>> step. >>>>>>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to >>>>>>>>>>>> compute poisson >>>>>>>>>>>> >> and ampere equation at the same time? One on CPU and the >>>>>>>>>>>> other on GPU? >>>>>>>>>>>> >> >>>>>>>>>>>> >> Thanks >>>>>>>>>>>> >>>>>>>>>>> <out_gpu><out_nogpu> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> <out_miniapp_f_poisson> >>>>>> >>>>>> >>>>>>