I managed to get this to work. I defined a larger matrix with the dense blocks appended to the end of the matrix on the last processor. Currently, I am only running with one extra unknown, so this should not be a significant penalty for load balancing.
Since the larger matrix has the same I-j locations for the FE non-zeros, I use it directly in the FE assembly. I have tested with parallel MUMPS solves and it working smoothly. Also, the monolithic system removes the issue with the singularity of J_fe at/near the bifurcation point. Next, I would like to figure out if there are ways to bring in iterative solvers to solve this more efficiently. My J_fe comes from a nonlinear shell deformation problem with snap through response. I am not sure if it would make sense to use an AMG solver on this monolithic matrix, as opposed to using it as a preconditioner for J_fe in the Schur-factorization approach. The LOCA solver in Trillions was able to find some success with the latter approach: https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508 <https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508> I would appreciate any general thoughts concerning this. Regards, Manav > On May 29, 2019, at 9:11 PM, Manav Bhatia <bhatiama...@gmail.com> wrote: > > Barry, > > Thanks for the detailed message. > > I checked libMesh’s continuation sovler and it appears to be using the > same system solver without creating a larger matrix: > https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C > > <https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C> > > > I need to implement this in my code, MAST, for various reasons (mainly, it > fits inside a bigger workflow). The current implementation implementation > follows the Schur factorization approach: > https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details > > <https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details> > > > I will look into some solutions pertaining to the use of > MatGetLocalSubMatrix or leverage some existing functionality in libMesh. > > Thanks, > Manav > > >> On May 29, 2019, at 7:04 PM, Smith, Barry F. <bsm...@mcs.anl.gov >> <mailto:bsm...@mcs.anl.gov>> wrote: >> >> >> Understood. Where are you putting the "few extra unknowns" in the vector >> and matrix? On the first process, on the last process, some places in the >> middle of the matrix? >> >> We don't have any trivial code for copying a big matrix into a even larger >> matrix directly because we frown on doing that. It is very wasteful in time >> and memory. >> >> The simplest way to do it is call MatGetRow() twice for each row, once to >> get the nonzero locations for each row to determine the numbers needed for >> preallocation and then the second time after the big matrix has been >> preallocated to get the nonzero locations and numerical values for the row >> to call MatSetValues() with to set that row into the bigger matrix. Note of >> course when you call MatSetValues() you will need to shift the rows and >> column locations to take into account the new rows and columns in the bigger >> matrix. If you put the "extra unknowns" at the every end of the rows/columns >> on the last process you won't have to shift. >> >> Note that B being dense really messes up chances for load balancing since >> its rows are dense and take a great deal of space so whatever process gets >> those rows needs to have much less of the mesh. >> >> The correct long term approach is to have libmesh provide the needed >> functionality (for continuation) for the slightly larger matrix directly so >> huge matrices do not need to be copied. >> >> I noticed that libmesh has some functionality related to continuation. I do >> not know if they handle it by creating the larger matrix and vector and >> filling that up directly for finite elements. If they do then you should >> definitely take a look at that and see if it can be extended for your case >> (ignore the continuation algorithm they may be using, that is not relevant, >> the question is if they generate the larger matrices and if you can leverage >> this). >> >> >> The ultimate hack would be to (for example) assign the extra variables to >> the end of the last process and hack lib mesh a little bit so the matrix it >> creates (before it puts in the numerical values) has the extra rows and >> columns, that libmesh will not put the values into but you will. Thus you >> get libmesh to fill up the true final matrix for its finite element problem >> (not realizing the matrix is a little bigger then it needs) directly, no >> copies of the data needed. But this is bit tricky, you'll need to combine >> libmesh's preallocation information with yours for the final columns and >> rows before you have lib mesh put the numerical values in. Double check if >> they have any support for this first. >> >> Barry >> >> >>> On May 29, 2019, at 6:29 PM, Manav Bhatia <bhatiama...@gmail.com >>> <mailto:bhatiama...@gmail.com>> wrote: >>> >>> Thanks, Barry. >>> >>> I am working on a FE application (involving bifurcation behavior) with >>> libMesh where I need to solve the system of equations along with a few >>> extra unknowns that are not directly related to the FE mesh. I am able to >>> assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my >>> code and libMesh provides me with the sparsity pattern for this. >>> >>> Next, the system of equations that I need to solve is: >>> >>> [ J_fe A ] { dX } = { R_fe } >>> [ B C ] { dV } = {R_ext } >>> >>> Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x >>> n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the >>> bordered system for my path continuation solver. >>> >>> I have implemented a solver using Schur factorization ( this is outside of >>> PETSc and does not use the FieldSplit construct ). This works well for most >>> cases, except when J_fe is close to singular. >>> >>> I am now attempting to create a monolithic matrix that solves the complete >>> system. >>> >>> Currently, the approach I am considering is to compute J_fe using my >>> libMesh application, so that I don’t have to change that. I am defining a >>> new matrix with the extra non-zero locations for A, B, C. >>> >>> With J_fe computed, I am looking to copy its non-zero entries to this new >>> matrix. This is where I am stumbling since I don’t know how best to get the >>> non-zero locations in J_fe. Maybe there is a better approach to copy from >>> J_fe to the new matrix? >>> >>> I have looked through the nested matrix construct, but have not given this >>> a serious consideration. Maybe I should? Note that I don’t want to solve >>> J_fe and C separately (not as separate systems), so the field-split >>> approach will not be suitable here. >>> >>> Also, I am currently using MUMPS for all my parallel solves. >>> >>> I would appreciate any advice. >>> >>> Regards, >>> Manav >>> >>> >>>> On May 29, 2019, at 6:07 PM, Smith, Barry F. <bsm...@mcs.anl.gov >>>> <mailto:bsm...@mcs.anl.gov>> wrote: >>>> >>>> >>>> Manav, >>>> >>>> For parallel sparse matrices using the standard PETSc formats the matrix >>>> is stored in two parts on each process (see the details in MatCreateAIJ()) >>>> thus there is no inexpensive way to access directly the IJ locations as a >>>> single local matrix. What are you hoping to use the information for? >>>> Perhaps we have other suggestions on how to achieve the goal. >>>> >>>> Barry >>>> >>>> >>>>> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users >>>>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Once a MPI-AIJ matrix has been assembled, is there a method to get the >>>>> nonzero I-J locations? I see one for sequential matrices here: >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html >>>>> >>>>> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html> >>>>> , but not for parallel matrices. >>>>> >>>>> Regards, >>>>> Manav >>>>> >>>>> >>>> >>> >> >