> On Jul 24, 2024, at 5:33 PM, Sreeram R Venkat <srven...@utexas.edu> wrote:
> 
> Thanks for the suggestions; I will try them out.
> 
> Dense factorization is used as the benchmark for Top500 right? That's why I 
> thought there would be some state-of-the-art multi GPU dense linear solvers 
> out there.
> 
> I saw this library called cuSOLVERMp 
> https://urldefense.us/v3/__https://docs.nvidia.com/cuda/cusolvermp/__;!!G_uCfscf7eWS!fcn2UKnRziZm0rP7CEBwWeaeUaiRcgQDOKZWgikZt6UgU6FW640vVQ3rGtF-f3-0f1PZMImxSVNZzTtK5aVt4mw$
>   from NVIDIA. It looks somewhat difficult to integrate with other code, 
> though.

   The PETSc Scalapack interface could possibly be jiggered to get something to 
work with cusolvermp since their API's are similar.
> 
> I also found this 
> https://urldefense.us/v3/__https://github.com/nv-legate/cunumeric__;!!G_uCfscf7eWS!fcn2UKnRziZm0rP7CEBwWeaeUaiRcgQDOKZWgikZt6UgU6FW640vVQ3rGtF-f3-0f1PZMImxSVNZzTtKep1PYdI$
>   from NVIDIA which shows some good results for multi GPU Cholesky, but I'm 
> having some trouble getting it set up correctly.
> 
> On Wed, Jul 24, 2024, 12:08 PM Barry Smith <bsm...@petsc.dev 
> <mailto:bsm...@petsc.dev>> wrote:
>> 
>>    For one MPI rank, it looks like you can use -pc_type cholesky 
>> -pc_factor_mat_solver_type cupm though it is not documented in 
>> https://urldefense.us/v3/__https://petsc.org/release/overview/linear_solve_table/*direct-solvers__;Iw!!G_uCfscf7eWS!fcn2UKnRziZm0rP7CEBwWeaeUaiRcgQDOKZWgikZt6UgU6FW640vVQ3rGtF-f3-0f1PZMImxSVNZzTtKXNY00vs$
>>  
>> 
>>    Of if you also ./configure --download-kokkos --download-kokkos-kernels 
>> you can use -pc_factor_mat_solver_type kokkos if you also this may also work 
>> for multiple GPUs but that is not documented in the table either (Junchao) 
>> Nor are sparse Kokkos or CUDA stuff documented (if they exist) in the table.
>> 
>> 
>>    Barry
>> 
>> 
>> 
>>> On Jul 24, 2024, at 2:44 PM, Sreeram R Venkat <srven...@utexas.edu 
>>> <mailto:srven...@utexas.edu>> wrote:
>>> 
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> I have an SPD dense matrix of size NxN, where N can range from 10^4-10^5. 
>>> Are there any Cholesky factorization/solve routines for it in PETSc (or in 
>>> any of the external libraries)? If possible, I want to use GPU acceleration 
>>> with 1 or more GPUs. The matrix type can be MATSEQDENSE/MATMPIDENSE or 
>>> MATSEQDENSECUDA/MATMPIDENSECUDA accordingly. If it is possible to do the 
>>> factorization beforehand and store it to do the triangular solves later, 
>>> that would be great.
>>> 
>>> Thanks,
>>> Sreeram
>> 

Reply via email to