> On Jul 24, 2024, at 5:33 PM, Sreeram R Venkat <srven...@utexas.edu> wrote: > > Thanks for the suggestions; I will try them out. > > Dense factorization is used as the benchmark for Top500 right? That's why I > thought there would be some state-of-the-art multi GPU dense linear solvers > out there. > > I saw this library called cuSOLVERMp > https://urldefense.us/v3/__https://docs.nvidia.com/cuda/cusolvermp/__;!!G_uCfscf7eWS!fcn2UKnRziZm0rP7CEBwWeaeUaiRcgQDOKZWgikZt6UgU6FW640vVQ3rGtF-f3-0f1PZMImxSVNZzTtK5aVt4mw$ > from NVIDIA. It looks somewhat difficult to integrate with other code, > though.
The PETSc Scalapack interface could possibly be jiggered to get something to work with cusolvermp since their API's are similar. > > I also found this > https://urldefense.us/v3/__https://github.com/nv-legate/cunumeric__;!!G_uCfscf7eWS!fcn2UKnRziZm0rP7CEBwWeaeUaiRcgQDOKZWgikZt6UgU6FW640vVQ3rGtF-f3-0f1PZMImxSVNZzTtKep1PYdI$ > from NVIDIA which shows some good results for multi GPU Cholesky, but I'm > having some trouble getting it set up correctly. > > On Wed, Jul 24, 2024, 12:08 PM Barry Smith <bsm...@petsc.dev > <mailto:bsm...@petsc.dev>> wrote: >> >> For one MPI rank, it looks like you can use -pc_type cholesky >> -pc_factor_mat_solver_type cupm though it is not documented in >> https://urldefense.us/v3/__https://petsc.org/release/overview/linear_solve_table/*direct-solvers__;Iw!!G_uCfscf7eWS!fcn2UKnRziZm0rP7CEBwWeaeUaiRcgQDOKZWgikZt6UgU6FW640vVQ3rGtF-f3-0f1PZMImxSVNZzTtKXNY00vs$ >> >> >> Of if you also ./configure --download-kokkos --download-kokkos-kernels >> you can use -pc_factor_mat_solver_type kokkos if you also this may also work >> for multiple GPUs but that is not documented in the table either (Junchao) >> Nor are sparse Kokkos or CUDA stuff documented (if they exist) in the table. >> >> >> Barry >> >> >> >>> On Jul 24, 2024, at 2:44 PM, Sreeram R Venkat <srven...@utexas.edu >>> <mailto:srven...@utexas.edu>> wrote: >>> >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> I have an SPD dense matrix of size NxN, where N can range from 10^4-10^5. >>> Are there any Cholesky factorization/solve routines for it in PETSc (or in >>> any of the external libraries)? If possible, I want to use GPU acceleration >>> with 1 or more GPUs. The matrix type can be MATSEQDENSE/MATMPIDENSE or >>> MATSEQDENSECUDA/MATMPIDENSECUDA accordingly. If it is possible to do the >>> factorization beforehand and store it to do the triangular solves later, >>> that would be great. >>> >>> Thanks, >>> Sreeram >>