Please keep on list, On Thu, Feb 17, 2022 at 12:36 PM Bojan Niceno < bojan.niceno.scient...@gmail.com> wrote:
> Dear Mark, > > Sorry for mistakenly calling you Adam before. > > I was thinking about the o_nnz as you suggested, but then something else > occurred to me. So, I determine the d_nnz and o_nnz based on METIS domain > decomposition which I perform outside of PETSc, before I even call PETSc > initialize. Hence, if PETSc works out its own domain decomposition > PETSc does not work out its own decomposition. You specify the decomposition completely with https://petsc.org/main/docs/manualpages/Mat/MatCreateAIJ.html#MatCreateAIJ > and communication patterns, they might be different from mine, and > therefore MatSetValue misses some entries. Will PETSc follow the same > domain decomposition which it works from calls to MatSetValue from > different processors, or will it re-shuffle the matrix entries? > > Cheers, > > Bojan > > On Thu, Feb 17, 2022 at 6:14 PM Bojan Niceno < > bojan.niceno.scient...@gmail.com> wrote: > >> Thanks a lot for the hints Adam :-) >> >> Cheers, >> >> Bojan >> >> On Thu, Feb 17, 2022 at 6:05 PM Mark Adams <mfad...@lbl.gov> wrote: >> >>> >>> >>> On Thu, Feb 17, 2022 at 11:46 AM Bojan Niceno < >>> bojan.niceno.scient...@gmail.com> wrote: >>> >>>> Dear all, >>>> >>>> >>>> I am experiencing difficulties when using PETSc in parallel in an >>>> unstructured CFD code. It uses CRS format to store its matrices. I use >>>> the following sequence of PETSc call in the hope to get PETSc solving my >>>> linear systems in parallel. Before I continue, I would just like to say >>>> that the code is MPI parallel since long time ago, and performs its own >>>> domain decomposition through METIS, and it works out its communication >>>> patterns which work with its home-grown (non-PETSc) linear solvers. >>>> Anyhow, I issue the following calls: >>>> >>>> err = PetscInitialize(0, NULL, (char*)0, help); >>>> >>>> err = MatCreate(MPI_COMM_WORLD, A); >>>> In the above, I use MPI_COMM_WORLD instead of PETSC_COMM_SELF because >>>> the call to MPI_Init is invoked outside of PETSc, from the main program. >>>> >>>> err = MatSetSizes(A, m, m, M, M); >>>> Since my matrices are exclusively square, M is set to the total number >>>> of computational cells, while m is equal to the number of computational >>>> cells within each subdomain/processor. (Not all processors necessarily >>>> have the same m, it depends on domain decomposition.) I do not distinguish >>>> between m (M) and n (N) since matrices are all square. Am I wrong to >>>> assume that? >>>> >>>> err = MatSetType(A, MATAIJ); >>>> I set the matrix to be of type MATAIJ, to cover runs on one and on more >>>> processors. By the way, on one processors everything works fine >>>> >>>> err = MatMPIAIJSetPreallocation(A, 0, d_nnz, 0, o_nnz); >>>> err = MatSeqAIJSetPreallocation(A, 0, d_nnz); >>>> The two lines above specify matrix preallocation. Since d_nz and o_nz >>>> vary from cell to cell (row to row), I set them to zero and provide arrays >>>> with number of diagonal and off diagonal zeroes instead. To my >>>> understanding, that is legit since d_nz and o_nz are neglected if d_nnz and >>>> o_nnz are provided. Am I wrong? >>>> >>>> Finally, inside a loop through rows and columns I call: >>>> >>>> err = MatSetValue(A, row, col, value, INSERT_VALUES); >>>> Here I make sure that row and col point to global cell (unknown) >>>> numbers. >>>> >>>> Yet, when I run the code on more than one processor, I get the error: >>>> >>>> [3]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [3]PETSC ERROR: Argument out of range >>>> [3]PETSC ERROR: New nonzero at (21,356) caused a malloc >>>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to >>>> turn off this check >>>> >>>> [3]PETSC ERROR: #1 MatSetValues_MPIAIJ() at >>>> /home/niceno/Development/petsc-debug/src/mat/impls/aij/mpi/mpiaij.c:517 >>>> [3]PETSC ERROR: #2 MatSetValues() at >>>> /home/niceno/Development/petsc-debug/src/mat/interface/matrix.c:1398 >>>> [3]PETSC ERROR: #3 MatSetValues_MPIAIJ() at >>>> /home/niceno/Development/petsc-debug/src/mat/impls/aij/mpi/mpiaij.c:517 >>>> [3]PETSC ERROR: #4 MatSetValues() at >>>> /home/niceno/Development/petsc-debug/src/mat/interface/matrix.c:1398 >>>> >>>> and so forth, for roughly 10% of all matrix entries. I checked if >>>> these errors occur only for off-diagonal parts of the matrix entries, but >>>> that is not the case. >>>> >>>> Error code is 63; PETSC_ERR_ARG_OUTOFRANGE >>>> >>>> Does anyone have an idea what am I doing wrong? Is any of my >>>> assumptions above (like thinking n(N) is always m(M) for square matrices, >>>> that I can send zeros as d_nz and o_nz if I provide arrays d_nnz[] and >>>> o_nnz[] wrong? >>>> >>> >>> That is correct. >>> >>> >>>> Any idea how to debug it, where to look for an error? >>>> >>>> >>> I would guess that you are counting your o_nnz incorrectly. It looks >>> like a small number of equations per process because the 4th process has >>> row 21, apparently. Does that sound right? >>> >>> And column 356 is going to be in the off-diagonal block (ie, "o"). I >>> would start with a serial matrix and run with -info. This will be noisy but >>> you will see things like"number of unneeded..." that you can verify that >>> you have set d_nnz perfectly (there should be 0 unneeded). >>> Then try two processors. If it fails you could a print statement in >>> everytime the row (eg, 21) is added to and check what your code for >>> computing o_nnz is doing. >>> >>> I am carefully checking all the data I send to PETSc functions and looks >>>> correct to me, but maybe I lack some fundamental understanding of what >>>> should be provided to PETSc, as I write above? >>>> >>> >>> It is a bit confusing at first. The man page gives a concrete example >>> https://petsc.org/main/docs/manualpages/Mat/MatCreateAIJ.html#MatCreateAIJ >>> >>> >>>> >>>> >>>> Best regards, >>>> >>>> >>>> Bojan Niceno >>>> >>>>