Ok so just to double check the things I should do: 1. Create a new header for MatCreateSeqAIJKokkosWithCSRMatrix (and declare it PETSC_EXTERN) so users can call the existing method and build a seqaijkokkos matrix with no host involvement. 2. Modify *MatCreateMPIAIJWithSeqAIJ (*or equivalent*) *so it does the same thing as MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices in the case that A and B are seqaijkokkos matrices. 3. Potentially remove MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices given it would be redundant?
On Wed, 26 Feb 2025 at 17:15, Junchao Zhang <junchao.zh...@gmail.com> wrote: > That is a good idea. Perhaps a new MatCreateMPIXAIJWithSeqXAIJ(MPI_Comm > comm, PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat > *mat) since garray[] is only meaningful for MATAIJ and subclasses. > > --Junchao Zhang > > > On Wed, Feb 26, 2025 at 11:02 AM Barry Smith <bsm...@petsc.dev> wrote: > >> >> The new function doesn't seem to have anything to do with Kokkos so >> why have any new functions? Just have *MatCreateMPIAIJWithSeqAIJ() work >> properly when the two matrices are Kokkos (or CUDA or HIP). Or if you >> want to eliminate the global reduction maybe make your new function >> MatCreateMPIWithSeq() and have it work for any type of submatrix and >> eventually we could deprecate the **MatCreateMPIAIJWithSeqAIJ() * >> >> * Barry* >> >> >> >> >> On Feb 26, 2025, at 11:54 AM, Steven Dargaville < >> dargaville.ste...@gmail.com> wrote: >> >> I think that sounds great, I'm happy to put together an MR (likely next >> week) for review. >> >> On Wed, 26 Feb 2025 at 16:11, Junchao Zhang <junchao.zh...@gmail.com> >> wrote: >> >>> This fuction *MatCreateMPIAIJWithSeqAIJ(MPI_Comm comm, Mat A, Mat B, >>> const PetscInt garray[], Mat *mat) *is rarely used. To compute the >>> global matrix's row/col size M, N, it has to do an MPI_Allreduce(). I think >>> it is a waste, as the caller usually knows M, N already. So I think we can >>> depart from it and have a new one: >>> >>> MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, PetscInt M, >>> PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat *mat) >>> * M, N are global row/col size of mat >>> * A, B are MATSEQAIJKOKKOS >>> * M, N could be PETSC_DECIDE, if so, petsc will compute mat's M, N from >>> A, i.e., M = Sum of A's M, N= Sum of A's N >>> * if garray is NULL, B uses global column indices (and B's N should be >>> equal to the output mat's N) >>> * if garray is not NULL, B uses local column indices; garray[] was >>> allocated by PetscMalloc() and after the call, garray will be owned by mat >>> (user should not free garray afterwards). >>> >>> What do you think? If you agree, could you contribute an MR? >>> >>> BTW, I think we need to create a new header, petscmat_kokkos.hpp to >>> declare >>> PetscErrorCode MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm >>> comm, KokkosCsrMatrix csr, Mat *A) >>> but >>> PetscErrorCode MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, >>> PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat *mat) >>> can be in petscmat.h as it uses only C types. >>> >>> Barry, what do you think of the two new APIs? >>> >>> --Junchao Zhang >>> >>> >>> On Wed, Feb 26, 2025 at 6:26 AM Steven Dargaville < >>> dargaville.ste...@gmail.com> wrote: >>> >>>> Those two constructors would definitely meet my needs, thanks! >>>> >>>> Also I should note that the comment about garray and B in >>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices is correct if garray >>>> is passed in as NULL, it's just that if you pass in a completed garray it >>>> doesn't bother creating one or changing the column indices of B. So I would >>>> suggest the comment be: "if garray is NULL the offdiag matrix B should >>>> have global col ids; if garray is not NULL the offdiag matrix B should have >>>> local col ids" >>>> >>>> On Wed, 26 Feb 2025 at 03:35, Junchao Zhang <junchao.zh...@gmail.com> >>>> wrote: >>>> >>>>> Mat_SeqAIJKokkos is private because it is in a private header >>>>> src/mat/impls/aij/seq/kokkos/aijkok.hpp >>>>> >>>>> Your observation about the garray in >>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() >>>>> might be right. The comment >>>>> >>>>> - B - the offdiag matrix using global col ids >>>>> >>>>> is out of date. Perhaps it should be "the offdiag matrix uses local >>>>> column indices and garray contains the local to global mapping". But I >>>>> need to double check it. >>>>> >>>>> Since you use Kokkos, I think we could provide these two constructors >>>>> for MATSEQAIJKOKKOS and MATMPIAIJKOKKOS respectively >>>>> >>>>> - MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm comm, >>>>> KokkosCsrMatrix csr, Mat *A) >>>>> >>>>> >>>>> - MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, Mat A, Mat >>>>> B, PetscInt *garray, Mat *mat) >>>>> >>>>> // To mimic the existing MatCreateMPIAIJWithSeqAIJ(MPI_Comm >>>>> comm, Mat A, Mat B, const PetscInt garray[], Mat *mat); >>>>> // A and B are MATSEQAIJKOKKOS matrices and use local column >>>>> indices >>>>> >>>>> Do they meet your needs? >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Tue, Feb 25, 2025 at 5:35 PM Steven Dargaville < >>>>> dargaville.ste...@gmail.com> wrote: >>>>> >>>>>> Thanks for the response! >>>>>> >>>>>> Although MatSetValuesCOO happens on the device if the input coo_v >>>>>> pointer is device memory, I believe MatSetPreallocationCOO requires host >>>>>> pointers for coo_i and coo_j, and the preallocation (and construction of >>>>>> the COO structures) happens on the host and is then copied onto the >>>>>> device? >>>>>> I need to be able to create a matrix object with minimal work on the host >>>>>> (like many of the routines in aijkok.kokkos.cxx do internally). I >>>>>> originally used the COO interface to build the matrices I need, but that >>>>>> was around 5x slower than constructing the aij structures myself on the >>>>>> device and then just directly using the MatSetSeqAIJKokkosWithCSRMatrix >>>>>> type methods. >>>>>> >>>>>> The reason I thought MatSetSeqAIJKokkosWithCSRMatrix could easily be >>>>>> made public is that the Mat_SeqAIJKokkos constructors are already >>>>>> publicly >>>>>> accessible? In particular one of those constructors takes in pointers to >>>>>> the Kokkos dual views which store a,i,j, and hence one can build a >>>>>> sequential matrix with nothing (or very little) occuring on the host. The >>>>>> only change I can see that would be necessary is for >>>>>> MatSetSeqAIJKokkosWithCSRMatrix (or MatCreateSeqAIJKokkosWithCSRMatrix) >>>>>> to >>>>>> be public is to change the PETSC_INTERN to PETSC_EXTERN? >>>>>> >>>>>> For MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, I believe all >>>>>> that is required is declaring the method in the .hpp, as it's already >>>>>> defined as static in mpiaijkok.kokkos.cxx. In particular, the comments >>>>>> above MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices suggest that the >>>>>> off-diagonal block B needs to be built with global column ids, with >>>>>> mpiaij->garray constructed on the host along with the rewriting of the >>>>>> global column indices in B. This happens in MatSetUpMultiply_MPIAIJ, but >>>>>> checking the code there shows that if you pass in a non-null garray to >>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, the construction and >>>>>> compatification is skipped, meaning B can be built with local column ids >>>>>> as >>>>>> long as garray is provided on the host (which I also build on the device >>>>>> and then just copy to the host). Again this is what some of the internal >>>>>> Kokkos routines rely on, like the matrix-product. >>>>>> >>>>>> I am happy to try doing this and submitting a request to the petsc >>>>>> gitlab if this seems sensible, I just wanted to double check that I >>>>>> wasn't >>>>>> missing something important? >>>>>> Thanks >>>>>> Steven >>>>>> >>>>>> On Tue, 25 Feb 2025 at 22:16, Junchao Zhang <junchao.zh...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, Steven, >>>>>>> MatSetSeqAIJKokkosWithCSRMatrix(Mat A, Mat_SeqAIJKokkos *akok) uses >>>>>>> a private data type Mat_SeqAIJKokkos, so it can not be directly made >>>>>>> public. >>>>>>> If you already use COO, then why not directly make the matrix of >>>>>>> type MATAIJKOKKOS and call MatSetPreallocationCOO() and >>>>>>> MatSetValuesCOO()? >>>>>>> So I am confused by your needs. >>>>>>> >>>>>>> Thanks! >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 25, 2025 at 3:39 PM Steven Dargaville < >>>>>>> dargaville.ste...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> I'm just wondering if there is any possibility of making: >>>>>>>> MatSetSeqAIJKokkosWithCSRMatrix >>>>>>>> or MatCreateSeqAIJKokkosWithCSRMatrix in >>>>>>>> src/mat/impls/aij/seq/kokkos/aijkok.kokkos.cxx >>>>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices in >>>>>>>> src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx >>>>>>>> >>>>>>>> publicly accessible outside of petsc, or if there is an interface I >>>>>>>> have missed for creating Kokkos matrices entirely on the device? >>>>>>>> MatCreateSeqAIJKokkosWithCSRMatrix for example is marked as >>>>>>>> PETSC_INTERN so >>>>>>>> I can't link to it. >>>>>>>> >>>>>>>> I've currently just copied the code inside of those methods so that >>>>>>>> I can build without any preallocation on the host (e.g., through the >>>>>>>> COO >>>>>>>> interface) and it works really well. >>>>>>>> >>>>>>>> Thanks for your help >>>>>>>> Steven >>>>>>>> >>>>>>> >>