Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly

Smith, Barry F. via petsc-users Thu, 09 May 2019 08:58:42 -0700


> On May 9, 2019, at 9:24 AM, Abhyankar, Shrirang G 
> <shrirang.abhyan...@pnnl.gov> wrote:
> 
>> What code? The special sparsity pattern for blocks?     
> 
> Yes. See here 
> https://bitbucket.org/petsc/petsc/src/master/src/dm/impls/network/network.c#lines-1712
>  
> 
>> What if the user has dense blocks or don't know or care about the sparsity 
>> pattern? By default it should allocate something that every users code will 
>> fit in the matrix and then user callable options for making parts of the 
>> matrices sparse.
> 
> That's what it is doing I believe. If the user does not set the sparse matrix 
> block for the point then it uses a dense block.


   And this is perfectly fine. It may waste a lot of memory but it absolutely 
would not cause a massive slow down in MatSetValues(). 

> 
> Shri
> 
> 
> 
> On 5/9/19, 12:24 AM, "Smith, Barry F." <bsm...@mcs.anl.gov> wrote:
> 
> 
> 
>> On May 8, 2019, at 9:53 PM, Abhyankar, Shrirang G via petsc-users 
>> <petsc-users@mcs.anl.gov> wrote:
>> 
>> 
>> 
>> From: Matthew Knepley <knep...@gmail.com>
>> Date: Wednesday, May 8, 2019 at 9:29 PM
>> To: "Abhyankar, Shrirang G" <shrirang.abhyan...@pnnl.gov>
>> Cc: "Zhang, Hong" <hzh...@mcs.anl.gov>, Justin Chang <jychan...@gmail.com>, 
>> petsc-users <petsc-users@mcs.anl.gov>
>> Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly
>> 
>> On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users 
>> <petsc-users@mcs.anl.gov> wrote:
>> 
>> 
>> From: petsc-users <petsc-users-boun...@mcs.anl.gov> on behalf of "Zhang, 
>> Hong via petsc-users" <petsc-users@mcs.anl.gov>
>> Reply-To: "Zhang, Hong" <hzh...@mcs.anl.gov>
>> Date: Wednesday, May 8, 2019 at 8:01 PM
>> To: Justin Chang <jychan...@gmail.com>
>> Cc: petsc-users <petsc-users@mcs.anl.gov>
>> Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly
>> 
>> Justin:
>> Great, the issue is resolved.
>> Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not 
>> raise an error?
>> 
>> I copy-pasted the above line from the power.c example. MatSetOption should 
>> use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option.
>> 
>> Matt,
>> 
>> We usually prevent this with a structured SetValues API. For example, DMDA 
>> uses MatSetValuesStencil() which cannot write
>> outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is 
>> guaranteed to be allocated. We should write one
>> for DMNetwork. The allocation is just like Plex (I believe) where you 
>> allocate closure(star(p)), which would mean that setting
>> values for a vertex gets the neighboring edges and their vertices, and 
>> setting values for an edge gets the covering vertices.
>> Is that right for DMNetwork?
>> Yes, DMNetwork behaves in this fashion. 
>> I cannot find MatSetValuesClosure() in petsc-master. 
>> Can you provide detailed instruction on how to implement  
>> MatSetValuesClosure() for DMNetwork?
>> Note, dmnetwork is a subclass of DMPlex.
>> 
>> DMNetwork does not do any matrix creation by itself. It calls Plex to create 
>> the matrix.
>> 
>> Right. However, the only operation I put in was MatSetClosure() since that 
>> is appropriate for FEM. I think you would need
>> a MatSetStar() for DMNetwork as well.
>> 
>> I see that Hong has implemented has some additional code in 
>> DMCreateMatrix_Network that does not use Plex for creating the matrix.  I 
>> think it covers what you describe above.
>> 
>> Hong: DMNetwork matrix creation code is not used unless the user wants to 
>> set special sparsity pattern for the blocks. Shouldn’t this code
> 
>       What code? The special sparsity pattern for blocks? 
> 
>> be used by default instead of having Plex create the matrix?
> 
>       What if the user has dense blocks or don'ts know or care about the 
> sparsity pattern? By default it should allocate something that every users 
> code will fit in the matrix and then user callable options for making parts 
> of the matrices sparse.
> 
>       Barry
> 
>> 
>> Shri
>> 
>>   Matt
>> 
>> Hong
>> 
>> 
>> On Wed, May 8, 2019 at 4:00 PM Dave May <dave.mayhe...@gmail.com> wrote:
>> 
>> 
>> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users 
>> <petsc-users@mcs.anl.gov> wrote:
>> So here's the branch/repo to the working example I have: 
>> 
>> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex
>> 
>> Type 'make' to compile the dss, it should work with the latest petsc-dev
>> 
>> To test the performance, I've taken an existing IEEE 13-bus and duplicated 
>> it N times to create a long radial-like network. I have three sizes where N 
>> = 100, 500, and 1000. Those test files are listed as:
>> 
>> input/test_100.m
>> input/test_500.m
>> input/test_1000.m
>> 
>> I also created another set of examples where the IEEE 13-bus is fully 
>> balanced (but the program will crash ar the solve step because I used some 
>> unrealistic parameters for the Y-bus matrices and probably have some zeros 
>> somewhere). They are listed as:
>> 
>> input/test2_100.m
>> input/test2_500.m
>> input/test2_1000.m
>> 
>> The dof count and matrices for the test2_*.m files are slightly larger than 
>> their respective test_*.m but they have a bs=6.
>> 
>> To run these tests, type the following:
>> 
>> ./dpflow -input input/test_100.m 
>> 
>> I have a timer that shows how long it takes to compute the Jacobian. 
>> Attached are the log outputs I have for each of the six cases.
>> 
>> Turns out that only the first call to the SNESComputeJacobian() is slow, all 
>> the subsequent calls are fast as I expect. This makes me think it still has 
>> something to do with matrix allocation.
>> 
>> I think it is a preallocation issue.
>> Looking to some of the output files (test_1000.out, test_100.out), under Mat 
>> Object I see this in the KSPView
>> 
>>      total number of mallocs used during MatSetValues calls =10000
>> 
>> 
>> 
>> 
>> 
>> Thanks for the help everyone,
>> 
>> Justin
>> 
>> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley <knep...@gmail.com> wrote:
>> On Wed, May 8, 2019 at 2:30 PM Justin Chang <jychan...@gmail.com> wrote:
>> Hi everyone, 
>> 
>> Yes I have these lines in my code:
>> 
>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr);
>> ierr = 
>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);
>> 
>> Okay, its not allocation. So maybe Hong is right that its setting great big 
>> element matrices. We will see with the example.
>> 
>>  Thanks,
>> 
>>    Matt
>> 
>> I tried -info and here's my output:
>> 
>> [0] PetscInitialize(): PETSc successfully started: number of processors = 1
>> [0] PetscInitialize(): Running on machine: jchang31606s.domain
>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 
>> 140550815662944 max tags = 2147483647
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 
>> 140550815662944
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 
>> 140550815662944
>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl 
>> = 5000, numlbr = 109999, numtbr = 5000
>> 
>> **** Power flow dist case ****
>> 
>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 
>> 5000, nbranch = 114999
>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 
>> 140550815683104 max tags = 2147483647
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 
>> unneeded,10799928 used
>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
>> 620000) < 0.6. Do not use CompressedRow routines.
>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. 
>> Using Inode routines
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 
>> 140550815662944
>> [0] DMGetDMSNES(): Creating new DMSNES
>> [0] DMGetDMKSP(): Creating new DMKSP
>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 
>> 140550815683104
>>  0 SNES Function norm 1155.45 
>> 
>> nothing else -info related shows up as I'm iterating through the vertex loop.
>> 
>> I'll have a MWE for you guys to play with shortly.
>> 
>> Thanks,
>> Justin
>> 
>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
>> 
>>  Justin,
>> 
>>     Are you providing matrix entries that connect  directly one vertex to 
>> another vertex ACROSS an edge? I don't think that is supported by the 
>> DMNetwork model. The assumption is that edges are only connected to vertices 
>> and vertices are only connected to neighboring edges.
>> 
>>  Everyone,
>> 
>>  I second Matt's reply. 
>> 
>>  How is the DMNetwork preallocating for the Jacobian? Does it take into 
>> account coupling between neighboring vertices/edges? Or does it assume no 
>> coupling. Or assume full coupling. If it assumes no coupling and the user 
>> has a good amount of coupling it will be very slow. 
>> 
>>  There would need to be a way for the user provide the coupling information 
>> between neighboring vertices/edges if it assumes no coupling.
>> 
>>    Barry
>> 
>> 
>>> On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users 
>>> <petsc-users@mcs.anl.gov> wrote:
>>> 
>>> On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users 
>>> <petsc-users@mcs.anl.gov> wrote:
>>> Hi guys,
>>> 
>>> I have a fully working distribution system solver written using DMNetwork, 
>>> The idea is that each electrical bus can have up to three phase nodes, and 
>>> each phase node has two unknowns: voltage magnitude and angle. In a 
>>> completely balanced system, each bus has three nodes, but in an unbalanced 
>>> system some of the buses can be either single phase or two-phase.
>>> 
>>> The working DMNetwork code I developed, loosely based on the SNES 
>>> network/power.c, essentially represents each vertex as a bus. 
>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to 
>>> each vertex. If every single bus had the same number of variables, the mat 
>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if 
>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my 
>>> SNESFormJacobian function becomes extremely extremely slow. Specifically, 
>>> the MatSetValues() calls when the col/row global indices contain an offset 
>>> value that points to a neighboring bus vertex. 
>>> 
>>> I have never seen MatSetValues() be slow unless it is allocating. Did you 
>>> confirm that you are not allocating, with -info?
>>> 
>>>  Thanks,
>>> 
>>>     MAtt
>>> 
>>> Why is that? Is it because I no longer have a uniform block structure and 
>>> lose the speed/optimization benefits of iterating through an AIJ matrix? I 
>>> see three potential workarounds:
>>> 
>>> 1) Treat every vertex as a three phase bus and "zero out" all the unused 
>>> phase node dofs and put a 1 in the diagonal. The problem I see with this is 
>>> that I will have unnecessary degrees of freedom (aka non-zeros in the 
>>> matrix). From the distribution systems I've seen, it's possible that  
>>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I 
>>> may have nearly twice the amount of dofs than necessary if I wanted to 
>>> preserve the block size = 6 for the AU mat.
>>> 
>>> 2) Treat every phase node as a vertex aka solve a single-phase power flow 
>>> solver. That way I guarantee to have a block size = 2, this is what 
>>> Domenico's former student did in his thesis work. The problem I see with 
>>> this is that I have a larger graph, which can take more time to setup and 
>>> parallelize.
>>> 
>>> 3) Create a "fieldsplit" where I essentially have three "blocks" - one for 
>>> buses with all three phases, another for buses with only two phases, one 
>>> for single-phase buses. This way each block/fieldsplit will have a 
>>> consistent block size. I am not sure if this will solve the MatSetValues() 
>>> issues, but it's, but can anyone give pointers on how to go about achieving 
>>> this?
>>> 
>>> Thanks,
>>> Justin
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/
> 
> 
>

Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly

Reply via email to