> On May 9, 2019, at 9:24 AM, Abhyankar, Shrirang G > <shrirang.abhyan...@pnnl.gov> wrote: > >> What code? The special sparsity pattern for blocks? > > Yes. See here > https://bitbucket.org/petsc/petsc/src/master/src/dm/impls/network/network.c#lines-1712 > > >> What if the user has dense blocks or don't know or care about the sparsity >> pattern? By default it should allocate something that every users code will >> fit in the matrix and then user callable options for making parts of the >> matrices sparse. > > That's what it is doing I believe. If the user does not set the sparse matrix > block for the point then it uses a dense block.
And this is perfectly fine. It may waste a lot of memory but it absolutely would not cause a massive slow down in MatSetValues(). > > Shri > > > > On 5/9/19, 12:24 AM, "Smith, Barry F." <bsm...@mcs.anl.gov> wrote: > > > >> On May 8, 2019, at 9:53 PM, Abhyankar, Shrirang G via petsc-users >> <petsc-users@mcs.anl.gov> wrote: >> >> >> >> From: Matthew Knepley <knep...@gmail.com> >> Date: Wednesday, May 8, 2019 at 9:29 PM >> To: "Abhyankar, Shrirang G" <shrirang.abhyan...@pnnl.gov> >> Cc: "Zhang, Hong" <hzh...@mcs.anl.gov>, Justin Chang <jychan...@gmail.com>, >> petsc-users <petsc-users@mcs.anl.gov> >> Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly >> >> On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users >> <petsc-users@mcs.anl.gov> wrote: >> >> >> From: petsc-users <petsc-users-boun...@mcs.anl.gov> on behalf of "Zhang, >> Hong via petsc-users" <petsc-users@mcs.anl.gov> >> Reply-To: "Zhang, Hong" <hzh...@mcs.anl.gov> >> Date: Wednesday, May 8, 2019 at 8:01 PM >> To: Justin Chang <jychan...@gmail.com> >> Cc: petsc-users <petsc-users@mcs.anl.gov> >> Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly >> >> Justin: >> Great, the issue is resolved. >> Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not >> raise an error? >> >> I copy-pasted the above line from the power.c example. MatSetOption should >> use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. >> >> Matt, >> >> We usually prevent this with a structured SetValues API. For example, DMDA >> uses MatSetValuesStencil() which cannot write >> outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is >> guaranteed to be allocated. We should write one >> for DMNetwork. The allocation is just like Plex (I believe) where you >> allocate closure(star(p)), which would mean that setting >> values for a vertex gets the neighboring edges and their vertices, and >> setting values for an edge gets the covering vertices. >> Is that right for DMNetwork? >> Yes, DMNetwork behaves in this fashion. >> I cannot find MatSetValuesClosure() in petsc-master. >> Can you provide detailed instruction on how to implement >> MatSetValuesClosure() for DMNetwork? >> Note, dmnetwork is a subclass of DMPlex. >> >> DMNetwork does not do any matrix creation by itself. It calls Plex to create >> the matrix. >> >> Right. However, the only operation I put in was MatSetClosure() since that >> is appropriate for FEM. I think you would need >> a MatSetStar() for DMNetwork as well. >> >> I see that Hong has implemented has some additional code in >> DMCreateMatrix_Network that does not use Plex for creating the matrix. I >> think it covers what you describe above. >> >> Hong: DMNetwork matrix creation code is not used unless the user wants to >> set special sparsity pattern for the blocks. Shouldn’t this code > > What code? The special sparsity pattern for blocks? > >> be used by default instead of having Plex create the matrix? > > What if the user has dense blocks or don'ts know or care about the > sparsity pattern? By default it should allocate something that every users > code will fit in the matrix and then user callable options for making parts > of the matrices sparse. > > Barry > >> >> Shri >> >> Matt >> >> Hong >> >> >> On Wed, May 8, 2019 at 4:00 PM Dave May <dave.mayhe...@gmail.com> wrote: >> >> >> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users >> <petsc-users@mcs.anl.gov> wrote: >> So here's the branch/repo to the working example I have: >> >> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex >> >> Type 'make' to compile the dss, it should work with the latest petsc-dev >> >> To test the performance, I've taken an existing IEEE 13-bus and duplicated >> it N times to create a long radial-like network. I have three sizes where N >> = 100, 500, and 1000. Those test files are listed as: >> >> input/test_100.m >> input/test_500.m >> input/test_1000.m >> >> I also created another set of examples where the IEEE 13-bus is fully >> balanced (but the program will crash ar the solve step because I used some >> unrealistic parameters for the Y-bus matrices and probably have some zeros >> somewhere). They are listed as: >> >> input/test2_100.m >> input/test2_500.m >> input/test2_1000.m >> >> The dof count and matrices for the test2_*.m files are slightly larger than >> their respective test_*.m but they have a bs=6. >> >> To run these tests, type the following: >> >> ./dpflow -input input/test_100.m >> >> I have a timer that shows how long it takes to compute the Jacobian. >> Attached are the log outputs I have for each of the six cases. >> >> Turns out that only the first call to the SNESComputeJacobian() is slow, all >> the subsequent calls are fast as I expect. This makes me think it still has >> something to do with matrix allocation. >> >> I think it is a preallocation issue. >> Looking to some of the output files (test_1000.out, test_100.out), under Mat >> Object I see this in the KSPView >> >> total number of mallocs used during MatSetValues calls =10000 >> >> >> >> >> >> Thanks for the help everyone, >> >> Justin >> >> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley <knep...@gmail.com> wrote: >> On Wed, May 8, 2019 at 2:30 PM Justin Chang <jychan...@gmail.com> wrote: >> Hi everyone, >> >> Yes I have these lines in my code: >> >> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); >> ierr = >> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); >> >> Okay, its not allocation. So maybe Hong is right that its setting great big >> element matrices. We will see with the example. >> >> Thanks, >> >> Matt >> >> I tried -info and here's my output: >> >> [0] PetscInitialize(): PETSc successfully started: number of processors = 1 >> [0] PetscInitialize(): Running on machine: jchang31606s.domain >> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 >> 140550815662944 max tags = 2147483647 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >> 140550815662944 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >> 140550815662944 >> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl >> = 5000, numlbr = 109999, numtbr = 5000 >> >> **** Power flow dist case **** >> >> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = >> 5000, nbranch = 114999 >> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 >> 140550815683104 max tags = 2147483647 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 >> unneeded,10799928 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows >> 620000) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. >> Using Inode routines >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >> 140550815662944 >> [0] DMGetDMSNES(): Creating new DMSNES >> [0] DMGetDMKSP(): Creating new DMKSP >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> 0 SNES Function norm 1155.45 >> >> nothing else -info related shows up as I'm iterating through the vertex loop. >> >> I'll have a MWE for you guys to play with shortly. >> >> Thanks, >> Justin >> >> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote: >> >> Justin, >> >> Are you providing matrix entries that connect directly one vertex to >> another vertex ACROSS an edge? I don't think that is supported by the >> DMNetwork model. The assumption is that edges are only connected to vertices >> and vertices are only connected to neighboring edges. >> >> Everyone, >> >> I second Matt's reply. >> >> How is the DMNetwork preallocating for the Jacobian? Does it take into >> account coupling between neighboring vertices/edges? Or does it assume no >> coupling. Or assume full coupling. If it assumes no coupling and the user >> has a good amount of coupling it will be very slow. >> >> There would need to be a way for the user provide the coupling information >> between neighboring vertices/edges if it assumes no coupling. >> >> Barry >> >> >>> On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users >>> <petsc-users@mcs.anl.gov> wrote: >>> >>> On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users >>> <petsc-users@mcs.anl.gov> wrote: >>> Hi guys, >>> >>> I have a fully working distribution system solver written using DMNetwork, >>> The idea is that each electrical bus can have up to three phase nodes, and >>> each phase node has two unknowns: voltage magnitude and angle. In a >>> completely balanced system, each bus has three nodes, but in an unbalanced >>> system some of the buses can be either single phase or two-phase. >>> >>> The working DMNetwork code I developed, loosely based on the SNES >>> network/power.c, essentially represents each vertex as a bus. >>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to >>> each vertex. If every single bus had the same number of variables, the mat >>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if >>> the unknowns per DMNetwork vertex unknowns are not the same across, then my >>> SNESFormJacobian function becomes extremely extremely slow. Specifically, >>> the MatSetValues() calls when the col/row global indices contain an offset >>> value that points to a neighboring bus vertex. >>> >>> I have never seen MatSetValues() be slow unless it is allocating. Did you >>> confirm that you are not allocating, with -info? >>> >>> Thanks, >>> >>> MAtt >>> >>> Why is that? Is it because I no longer have a uniform block structure and >>> lose the speed/optimization benefits of iterating through an AIJ matrix? I >>> see three potential workarounds: >>> >>> 1) Treat every vertex as a three phase bus and "zero out" all the unused >>> phase node dofs and put a 1 in the diagonal. The problem I see with this is >>> that I will have unnecessary degrees of freedom (aka non-zeros in the >>> matrix). From the distribution systems I've seen, it's possible that >>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I >>> may have nearly twice the amount of dofs than necessary if I wanted to >>> preserve the block size = 6 for the AU mat. >>> >>> 2) Treat every phase node as a vertex aka solve a single-phase power flow >>> solver. That way I guarantee to have a block size = 2, this is what >>> Domenico's former student did in his thesis work. The problem I see with >>> this is that I have a larger graph, which can take more time to setup and >>> parallelize. >>> >>> 3) Create a "fieldsplit" where I essentially have three "blocks" - one for >>> buses with all three phases, another for buses with only two phases, one >>> for single-phase buses. This way each block/fieldsplit will have a >>> consistent block size. I am not sure if this will solve the MatSetValues() >>> issues, but it's, but can anyone give pointers on how to go about achieving >>> this? >>> >>> Thanks, >>> Justin >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > >