Thanks, it looks correct. I am getting memory leaks (appended) And something horrible is going on with performance:
MatLUFactorNum 130 1.0 9.2220e+00 1.0 6.51e+08 1.0 0.0e+00 0.0e+00 0.0e+00 30 0 0 0 0 30 0 0 0 0 71 0 390 3.33e+02 0 0.00e+00 0 MatLUFactorNum 130 1.0 6.5177e-01 1.0 1.28e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 1966 0 0 0.00e+00 0 0.00e+00 0 This is not urgent, but I'd like to get a serial LU GPU solver at some point. Thanks again, Mark Lots of these: [ 0]32 bytes VecCUDAAllocateCheck() line 34 in /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/ veccuda2.cu [ 0]32 bytes VecCUDAAllocateCheck() line 34 in /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/ veccuda2.cu [ 0]32 bytes VecCUDAAllocateCheck() line 34 in /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/ veccuda2.cu On Wed, Apr 15, 2020 at 12:47 PM Stefano Zampini <stefano.zamp...@gmail.com> wrote: > Mark > > attached is the patch. I will open an MR in the next days if you confirm > it is working for you > The issue is that CUSPARSE does not have a way to compute the triangular > factors, so we demand the computation of the factors to PETSc (CPU). These > factors are then copied to the GPU. > What was happening in the second step of SNES, was that the factors were > never updated since the offloadmask was never updated. > > The real issue is that the CUSPARSE support in PETSc is really in bad > shape and mostly untested, with coding solutions that are probably outdated > now. > I'll see what I can do to fix the class if I have time in the next weeks. > > Stefano > > Il giorno mer 15 apr 2020 alle ore 17:21 Mark Adams <mfad...@lbl.gov> ha > scritto: > >> >> >> On Wed, Apr 15, 2020 at 8:24 AM Stefano Zampini < >> stefano.zamp...@gmail.com> wrote: >> >>> Mark >>> >>> I have fixed few things in the solver and it is tested with the current >>> master. >>> >> >> I rebased with master over the weekend .... >> >> >>> Can you write a MWE to reproduce the issue? Which version of CUDA and >>> CUSPARSE are you using? >>> >> >> You can use mark/feature-xgc-interface-rebase branch and add '-mat_type >> seqaijcusparse -fp_pc_factor_mat_solver_type cusparse >> -mat_cusparse_storage_format ell -vec_type cuda' >> to dm/impls/plex/tutorials/ex10.c >> >> The first stage, SNES solve, actually looks OK here. Maybe. >> >> Thanks, >> >> 10:01 mark/feature-xgc-interface-rebase *= ~/petsc$ make -f gmakefile >> test search='dm_impls_plex_tutorials-ex10_0' >> /usr/bin/python /ccs/home/adams/petsc/config/gmakegentest.py >> --petsc-dir=/ccs/home/adams/petsc --petsc-arch=arch-summit-opt64-gnu-cuda >> --testdir=./arch-summit-opt64-gnu-cuda/tests >> Using MAKEFLAGS: search=dm_impls_plex_tutorials-ex10_0 >> CC >> arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10.o >> CLINKER arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10 >> TEST >> arch-summit-opt64-gnu-cuda/tests/counts/dm_impls_plex_tutorials-ex10_0.counts >> ok dm_impls_plex_tutorials-ex10_0 >> not ok diff-dm_impls_plex_tutorials-ex10_0 # Error code: 1 >> # 14,16c14,16 >> # < 0 SNES Function norm 6.184233768573e-04 >> # < 1 SNES Function norm 1.467479466750e-08 >> # < 2 SNES Function norm 7.863111141350e-12 >> # --- >> # > 0 SNES Function norm 6.184233768572e-04 >> # > 1 SNES Function norm 1.467479466739e-08 >> # > 2 SNES Function norm 7.863102870090e-12 >> # 18,31c18,256 >> # < 0 SNES Function norm 6.182952107532e-04 >> # < 1 SNES Function norm 7.336382211149e-09 >> # < 2 SNES Function norm 1.566979901443e-11 >> # < Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE >> iterations 2 >> # < 0 SNES Function norm 6.183592738545e-04 >> # < 1 SNES Function norm 7.337681407420e-09 >> # < 2 SNES Function norm 1.408823933908e-11 >> # < Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE >> iterations 2 >> # < [0] TSAdaptChoose_Basic(): Estimated scaled local truncation >> error 0.0396569, accepting step of size 1e-06 >> # < 1 TS dt 1.25e-06 time 1e-06 >> # < 1) species-0: charge density= -1.6024814608984e+01 >> z-momentum= 2.0080682964364e-19 energy= 1.2018000284846e+05 >> # < 1) species-1: charge density= 1.6021676653316e+01 >> z-momentum= 1.4964483981137e-17 energy= 1.2017223215083e+05 >> # < 1) species-2: charge density= 2.8838441139703e-03 >> z-momentum= -1.1062018110807e-23 energy= 1.2019641370376e-03 >> # < 1) Total: charge density= -2.5411155383649e-04, >> momentum= 1.5165279748763e-17, energy= 2.4035223620125e+05 (m_i[0]/m_e = >> 3670.94, 140 cells), 1 sub threads >> # --- >> # > 0 SNES Function norm 6.182952107531e-04 >> # > 1 SNES Function norm 6.181600164904e-04 >> # > 2 SNES Function norm 6.180249471739e-04 >> # > 3 SNES Function norm 6.178899987549e-04 >> >> >>> I was planning to reorganize the factor code in AIJCUSPARSE in the next >>> days. >>> >>> kl-18967:petsc zampins$ git grep "solver_type cusparse" >>> src/ksp/ksp/examples/tests/ex43.c: args: -f >>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse >>> -pc_factor_mat_*solver_type >>> cusparse* -mat_cusparse_storage_format ell -vec_type cuda -pc_type ilu >>> src/ksp/ksp/examples/tests/ex43.c: args: -f >>> ${DATAFILESPATH}/matrices/shallow_water1 -mat_type seqaijcusparse >>> -pc_factor_mat_*solver_type cusparse* -mat_cusparse_storage_format hyb >>> -vec_type cuda -ksp_type cg -pc_type icc >>> src/ksp/ksp/examples/tests/ex43.c: args: -f >>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse >>> -pc_factor_mat_*solver_type >>> cusparse* -mat_cusparse_storage_format csr -vec_type cuda -ksp_type >>> bicg -pc_type ilu >>> src/ksp/ksp/examples/tests/ex43.c: args: -f >>> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse >>> -pc_factor_mat_*solver_type >>> cusparse* -mat_cusparse_storage_format csr -vec_type cuda -ksp_type >>> bicg -pc_type ilu -pc_factor_mat_ordering_type nd >>> src/ksp/ksp/examples/tutorials/ex46.c: args: -dm_mat_type >>> aijcusparse -dm_vec_type cuda -random_exact_sol -pc_type ilu >>> -pc_factor_mat_*solver_type >>> cusparse* >>> src/ksp/ksp/examples/tutorials/ex59.c: args: -subdomain_mat_type >>> aijcusparse -physical_pc_bddc_dirichlet_pc_factor_mat_*solver_type >>> cusparse* >>> src/ksp/ksp/examples/tutorials/ex7.c: args: -ksp_monitor_short >>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse* >>> -vec_type cuda -sub_ksp_type preonly -sub_pc_type ilu >>> src/ksp/ksp/examples/tutorials/ex7.c: args: -ksp_monitor_short >>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse* >>> -vec_type cuda -sub_ksp_type preonly -sub_pc_type ilu >>> src/ksp/ksp/examples/tutorials/ex7.c: args: -ksp_monitor_short >>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse* >>> -vec_type cuda >>> src/ksp/ksp/examples/tutorials/ex7.c: args: -ksp_monitor_short >>> -mat_type aijcusparse -sub_pc_factor_mat_*solver_type cusparse* >>> -vec_type cuda >>> src/ksp/ksp/examples/tutorials/ex71.c: args: -pde_type Poisson -cells >>> 7,9,8 -dim 3 -ksp_view -pc_bddc_coarse_redundant_pc_type svd >>> -ksp_error_if_not_converged -pc_bddc_dirichlet_pc_type cholesky >>> -pc_bddc_dirichlet_pc_factor_mat_*solver_type cusparse* >>> -pc_bddc_dirichlet_pc_factor_mat_ordering_type nd -pc_bddc_neumann_pc_type >>> cholesky -pc_bddc_neumann_pc_factor_mat_*solver_type cusparse* >>> -pc_bddc_neumann_pc_factor_mat_ordering_type nd -matis_localmat_type >>> aijcusparse >>> src/ksp/ksp/examples/tutorials/ex72.c: args: -f0 >>> ${DATAFILESPATH}/matrices/medium -ksp_monitor_short -ksp_view -mat_view >>> ascii::ascii_info -mat_type aijcusparse -pc_factor_mat_*solver_type >>> cusparse* -pc_type ilu -vec_type cuda >>> src/snes/examples/tutorials/ex12.c: args: -matis_localmat_type >>> aijcusparse -pc_bddc_dirichlet_pc_factor_mat_*solver_type cusparse* >>> -pc_bddc_neumann_pc_factor_mat_*solver_type cusparse* >>> >>> On Apr 15, 2020, at 2:20 PM, Mark Adams <mfad...@lbl.gov> wrote: >>> >>> I tried using a serial direct solver in cusparse and got bad numerics: >>> >>> -vector_type cuda -mat_type aijcusparse -pc_factor_mat_solver_type >>> cusparse >>> >>> Before I start debugging this I wanted to see if there are any known >>> issues that I should be aware of. >>> >>> Thanks, >>> >>> >>> > > -- > Stefano >