> On Apr 15, 2020, at 10:14 PM, Mark Adams <mfad...@lbl.gov> wrote:
> 
> Thanks, it looks correct. I am getting memory leaks (appended)
> 
> And something horrible is going on with performance:
> 
> MatLUFactorNum       130 1.0 9.2220e+00 1.0 6.51e+08 1.0 0.0e+00 0.0e+00 
> 0.0e+00 30  0  0  0  0  30  0  0  0  0    71       0    390 3.33e+02    0 
> 0.00e+00  0
> 
> MatLUFactorNum       130 1.0 6.5177e-01 1.0 1.28e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00  4  1  0  0  0   4  1  0  0  0  1966       0      0 0.00e+00    0 
> 0.00e+00  0
> 

Can you describe these numbers? It seems that in the second case the 
factorization is run on the CPU (as I explained in my previous message)

> This is not urgent, but I'd like to get a serial LU GPU solver at some point.
> 
> Thanks again,
> Mark
> 
> Lots of these:
> [ 0]32 bytes VecCUDAAllocateCheck() line 34 in 
> /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/veccuda2.cu 
> <http://veccuda2.cu/>
> [ 0]32 bytes VecCUDAAllocateCheck() line 34 in 
> /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/veccuda2.cu 
> <http://veccuda2.cu/>
> [ 0]32 bytes VecCUDAAllocateCheck() line 34 in 
> /autofs/nccs-svm1_home1/adams/petsc/src/vec/vec/impls/seq/seqcuda/veccuda2.cu 
> <http://veccuda2.cu/>
> 

Yes, as I said, the code is in bad shape. I’ll see what I can do.

> On Wed, Apr 15, 2020 at 12:47 PM Stefano Zampini <stefano.zamp...@gmail.com 
> <mailto:stefano.zamp...@gmail.com>> wrote:
> Mark
> 
> attached is the patch. I will open an MR in the next days if you confirm it 
> is working for you
> The issue is that CUSPARSE does not have a way to compute the triangular 
> factors, so we demand the computation of the factors to PETSc (CPU). These 
> factors are then copied to the GPU.
> What was happening in the second step of SNES, was that the factors were 
> never updated since the offloadmask was never updated.
> 
> The real issue is that the CUSPARSE support in PETSc is really in bad shape 
> and mostly untested, with coding solutions that are probably outdated now.
> I'll see what I can do to fix the class if I have time in the next weeks.
> 
> Stefano
> 
> Il giorno mer 15 apr 2020 alle ore 17:21 Mark Adams <mfad...@lbl.gov 
> <mailto:mfad...@lbl.gov>> ha scritto:
> 
> 
> On Wed, Apr 15, 2020 at 8:24 AM Stefano Zampini <stefano.zamp...@gmail.com 
> <mailto:stefano.zamp...@gmail.com>> wrote:
> Mark
> 
> I have fixed few things in the solver and it is tested with the current 
> master.
> 
> I rebased with master over the weekend ....
>  
> Can you write a MWE to reproduce the issue? Which version of CUDA and 
> CUSPARSE are you using?
> 
> You can use mark/feature-xgc-interface-rebase branch and add '-mat_type 
> seqaijcusparse -fp_pc_factor_mat_solver_type cusparse 
> -mat_cusparse_storage_format ell -vec_type cuda' to 
> dm/impls/plex/tutorials/ex10.c
> 
> The first stage, SNES solve, actually looks OK here. Maybe.
> 
> Thanks,
> 
> 10:01 mark/feature-xgc-interface-rebase *= ~/petsc$ make -f gmakefile test 
> search='dm_impls_plex_tutorials-ex10_0'
> /usr/bin/python /ccs/home/adams/petsc/config/gmakegentest.py 
> --petsc-dir=/ccs/home/adams/petsc --petsc-arch=arch-summit-opt64-gnu-cuda 
> --testdir=./arch-summit-opt64-gnu-cuda/tests
> Using MAKEFLAGS: search=dm_impls_plex_tutorials-ex10_0
>           CC arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10.o
>      CLINKER arch-summit-opt64-gnu-cuda/tests/dm/impls/plex/tutorials/ex10
>         TEST 
> arch-summit-opt64-gnu-cuda/tests/counts/dm_impls_plex_tutorials-ex10_0.counts
>  ok dm_impls_plex_tutorials-ex10_0
> not ok diff-dm_impls_plex_tutorials-ex10_0 # Error code: 1
> #       14,16c14,16
> #       <     0 SNES Function norm 6.184233768573e-04 
> #       <     1 SNES Function norm 1.467479466750e-08 
> #       <     2 SNES Function norm 7.863111141350e-12 
> #       ---
> #       >     0 SNES Function norm 6.184233768572e-04 
> #       >     1 SNES Function norm 1.467479466739e-08 
> #       >     2 SNES Function norm 7.863102870090e-12 
> #       18,31c18,256
> #       <     0 SNES Function norm 6.182952107532e-04 
> #       <     1 SNES Function norm 7.336382211149e-09 
> #       <     2 SNES Function norm 1.566979901443e-11 
> #       <   Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE 
> iterations 2
> #       <     0 SNES Function norm 6.183592738545e-04 
> #       <     1 SNES Function norm 7.337681407420e-09 
> #       <     2 SNES Function norm 1.408823933908e-11 
> #       <   Nonlinear fp_ solve converged due to CONVERGED_FNORM_RELATIVE 
> iterations 2
> #       < [0] TSAdaptChoose_Basic(): Estimated scaled local truncation error 
> 0.0396569, accepting step of size 1e-06
> #       < 1 TS dt 1.25e-06 time 1e-06
> #       <   1) species-0: charge density= -1.6024814608984e+01 z-momentum=  
> 2.0080682964364e-19 energy=  1.2018000284846e+05
> #       <   1) species-1: charge density=  1.6021676653316e+01 z-momentum=  
> 1.4964483981137e-17 energy=  1.2017223215083e+05
> #       <   1) species-2: charge density=  2.8838441139703e-03 z-momentum= 
> -1.1062018110807e-23 energy=  1.2019641370376e-03
> #       <         1) Total: charge density= -2.5411155383649e-04, momentum=  
> 1.5165279748763e-17, energy=  2.4035223620125e+05 (m_i[0]/m_e = 3670.94, 140 
> cells), 1 sub threads
> #       ---
> #       >     0 SNES Function norm 6.182952107531e-04 
> #       >     1 SNES Function norm 6.181600164904e-04 
> #       >     2 SNES Function norm 6.180249471739e-04 
> #       >     3 SNES Function norm 6.178899987549e-04 
>  
> I was planning to reorganize the factor code in AIJCUSPARSE in the next days.
> 
> kl-18967:petsc zampins$ git grep "solver_type cusparse"
> src/ksp/ksp/examples/tests/ex43.c:      args: -f 
> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse 
> -pc_factor_mat_solver_type cusparse -mat_cusparse_storage_format ell 
> -vec_type cuda -pc_type ilu
> src/ksp/ksp/examples/tests/ex43.c:      args: -f 
> ${DATAFILESPATH}/matrices/shallow_water1 -mat_type seqaijcusparse 
> -pc_factor_mat_solver_type cusparse -mat_cusparse_storage_format hyb 
> -vec_type cuda -ksp_type cg -pc_type icc
> src/ksp/ksp/examples/tests/ex43.c:      args: -f 
> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse 
> -pc_factor_mat_solver_type cusparse -mat_cusparse_storage_format csr 
> -vec_type cuda -ksp_type bicg -pc_type ilu
> src/ksp/ksp/examples/tests/ex43.c:      args: -f 
> ${DATAFILESPATH}/matrices/cfd.2.10 -mat_type seqaijcusparse 
> -pc_factor_mat_solver_type cusparse -mat_cusparse_storage_format csr 
> -vec_type cuda -ksp_type bicg -pc_type ilu -pc_factor_mat_ordering_type nd
> src/ksp/ksp/examples/tutorials/ex46.c:      args: -dm_mat_type aijcusparse 
> -dm_vec_type cuda -random_exact_sol -pc_type ilu -pc_factor_mat_solver_type 
> cusparse
> src/ksp/ksp/examples/tutorials/ex59.c:     args: -subdomain_mat_type 
> aijcusparse -physical_pc_bddc_dirichlet_pc_factor_mat_solver_type cusparse
> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short -mat_type 
> aijcusparse -sub_pc_factor_mat_solver_type cusparse -vec_type cuda 
> -sub_ksp_type preonly -sub_pc_type ilu
> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short -mat_type 
> aijcusparse -sub_pc_factor_mat_solver_type cusparse -vec_type cuda 
> -sub_ksp_type preonly -sub_pc_type ilu
> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short -mat_type 
> aijcusparse -sub_pc_factor_mat_solver_type cusparse -vec_type cuda
> src/ksp/ksp/examples/tutorials/ex7.c:      args: -ksp_monitor_short -mat_type 
> aijcusparse -sub_pc_factor_mat_solver_type cusparse -vec_type cuda
> src/ksp/ksp/examples/tutorials/ex71.c:   args: -pde_type Poisson -cells 7,9,8 
> -dim 3 -ksp_view -pc_bddc_coarse_redundant_pc_type svd 
> -ksp_error_if_not_converged -pc_bddc_dirichlet_pc_type cholesky 
> -pc_bddc_dirichlet_pc_factor_mat_solver_type cusparse 
> -pc_bddc_dirichlet_pc_factor_mat_ordering_type nd -pc_bddc_neumann_pc_type 
> cholesky -pc_bddc_neumann_pc_factor_mat_solver_type cusparse 
> -pc_bddc_neumann_pc_factor_mat_ordering_type nd -matis_localmat_type 
> aijcusparse
> src/ksp/ksp/examples/tutorials/ex72.c:      args: -f0 
> ${DATAFILESPATH}/matrices/medium -ksp_monitor_short -ksp_view -mat_view 
> ascii::ascii_info -mat_type aijcusparse -pc_factor_mat_solver_type cusparse 
> -pc_type ilu -vec_type cuda
> src/snes/examples/tutorials/ex12.c:      args: -matis_localmat_type 
> aijcusparse -pc_bddc_dirichlet_pc_factor_mat_solver_type cusparse 
> -pc_bddc_neumann_pc_factor_mat_solver_type cusparse
> 
>> On Apr 15, 2020, at 2:20 PM, Mark Adams <mfad...@lbl.gov 
>> <mailto:mfad...@lbl.gov>> wrote:
>> 
>> I tried using a serial direct solver in cusparse and got bad numerics:
>> 
>> -vector_type cuda -mat_type aijcusparse -pc_factor_mat_solver_type cusparse 
>> 
>> Before I start debugging this I wanted to see if there are any known issues 
>> that I should be aware of.
>> 
>> Thanks,
> 
> 
> 
> -- 
> Stefano

Reply via email to