PETSc itself only takes 47% of the runtime. I am not sure what is happening for the other half. For the PETSc half, it is all in the solve:
KSPSolve 20 1.0 5.3323e+03 1.0 1.01e+14 1.0 0.0e+00 0.0e+00 0.0e+00 47 100 0 0 0 47 100 0 0 0 18943 About 2/3 of that is matrix operations (I don't know where you are using LU) MatMult 19960 1.0 2.1336e+03 1.0 8.78e+13 1.0 0.0e+00 0.0e+00 0.0e+00 19 87 0 0 0 19 87 0 0 0 41163 MatMultAdd 152320 1.0 8.4854e+02 1.0 3.60e+13 1.0 0.0e+00 0.0e+00 0.0e+00 7 35 0 0 0 7 35 0 0 0 42442 MatSolve 6600 1.0 9.0724e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 and 1/3 is vector operations for orthogonalization in GMRES: KSPGMRESOrthog 3290 1.0 1.2390e+03 1.0 8.77e+12 1.0 0.0e+00 0.0e+00 0.0e+00 11 9 0 0 0 11 9 0 0 0 7082 VecMAXPY 13220 1.0 1.7894e+03 1.0 9.02e+12 1.0 0.0e+00 0.0e+00 0.0e+00 16 9 0 0 0 16 9 0 0 0 5040 The flop rates do not look crazy, but I do not know what kind of hardware you are running on. Thanks, Matt On Fri, Jun 14, 2024 at 1:20 AM Yongzhong Li <yongzhong...@mail.utoronto.ca> wrote: > Thanks, I have attached the results without using any KSPGuess. At low > frequency, the iteration steps are quite close to the one with KSPGuess, > specifically KSPGuess Object: 1 MPI process type: fischer Model 1, size 200 > However, I found at > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Thanks, I have attached the results without using any KSPGuess. At low > frequency, the iteration steps are quite close to the one with KSPGuess, > specifically > > KSPGuess Object: 1 MPI process > > type: fischer > > Model 1, size 200 > > However, I found at higher frequency, the # of iteration steps are > significant higher than the one with KSPGuess, I have attahced both of the > results for your reference. > > Moreover, could I ask why the one without the KSPGuess options can be used > for a baseline comparsion? What are we comparing here? How does it relate > to the performance issue/bottleneck I found? “*I have noticed that the > time taken by **KSPSolve is **almost two times greater than the CPU time > for matrix-vector product multiplied by the number of iteration*” > > Thank you! > Yongzhong > > > > *From: *Barry Smith <bsm...@petsc.dev> > *Date: *Thursday, June 13, 2024 at 2:14 PM > *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> > *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, > petsc-ma...@mcs.anl.gov <petsc-ma...@mcs.anl.gov>, Piero Triverio < > piero.trive...@utoronto.ca> > *Subject: *Re: [petsc-maint] Assistance Needed with PETSc KSPSolve > Performance Issue > > > > Can you please run the same thing without the KSPGuess option(s) for a > baseline comparison? > > > > Thanks > > > > Barry > > > > On Jun 13, 2024, at 1:27 PM, Yongzhong Li <yongzhong...@mail.utoronto.ca> > wrote: > > > > This Message Is From an External Sender > > This message came from outside your organization. > > Hi Matt, > > I have rerun the program with the keys you provided. The system output > when performing ksp solve and the final petsc log output were stored in a > .txt file attached for your reference. > > Thanks! > Yongzhong > > > > *From: *Matthew Knepley <knep...@gmail.com> > *Date: *Wednesday, June 12, 2024 at 6:46 PM > *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> > *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, > petsc-ma...@mcs.anl.gov <petsc-ma...@mcs.anl.gov>, Piero Triverio < > piero.trive...@utoronto.ca> > *Subject: *Re: [petsc-maint] Assistance Needed with PETSc KSPSolve > Performance Issue > > 你通常不会收到来自 knep...@gmail.com 的电子邮件。了解这一点为什么很重要 > <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!djGfJnEhNJROfsMsBJy5u_KoRKbug55xZ64oHKUFnH2cWku_Th1hwt4TDdoMd8pWYVDzJeqJslMNZwpO3y0Et94d31qk-oCEwo4$> > > On Wed, Jun 12, 2024 at 6:36 PM Yongzhong Li < > yongzhong...@mail.utoronto.ca> wrote: > > Dear PETSc’s developers, I hope this email finds you well. I am currently > working on a project using PETSc and have encountered a performance issue > with the KSPSolve function. Specifically, I have noticed that the time > taken by KSPSolve is > > ZjQcmQRYFpfptBannerStart > > *This Message Is From an External Sender* > > This message came from outside your organization. > > > > ZjQcmQRYFpfptBannerEnd > > Dear PETSc’s developers, > > I hope this email finds you well. > > I am currently working on a project using PETSc and have encountered a > performance issue with the KSPSolve function. Specifically, *I have > noticed that the time taken by **KSPSolve is **almost two times greater > than the CPU time for matrix-vector product multiplied by the number of > iteration steps*. I use C++ chrono to record CPU time. > > For context, I am using a shell system matrix A. Despite my efforts to > parallelize the matrix-vector product (Ax), the overall solve time > remains higher than the matrix vector product per iteration indicates > when multiple threads were used. Here are a few details of my setup: > > - *Matrix Type*: Shell system matrix > - *Preconditioner*: Shell PC > - *Parallel Environment*: Using Intel MKL as PETSc’s BLAS/LAPACK > library, multithreading is enabled > > I have considered several potential reasons, such as preconditioner setup, > additional solver operations, and the inherent overhead of using a shell > system matrix. *However, since KSPSolve is a high-level API, I have been > unable to pinpoint the exact cause of the increased solve time.* > > Have you observed the same issue? Could you please provide some experience > on how to diagnose and address this performance discrepancy? Any > insights or recommendations you could offer would be greatly appreciated. > > > > For any performance question like this, we need to see the output of your > code run with > > > > -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view > > > > Thanks, > > > > Matt > > > > Thank you for your time and assistance. > > Best regards, > > Yongzhong > > ----------------------------------------------------------- > > *Yongzhong Li* > > PhD student | Electromagnetics Group > > Department of Electrical & Computer Engineering > > University of Toronto > > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!fTDOqOTfYZs4FVyI7NuFX2IPcFNkDKfw0tBwg7sqK1df_HIGAzkpZHNBcWjz96Mfb2isyStipMBB1awwc73f$ > > <https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!cuLttMJEcegaqu461Bt4QLsO4fASfLM5vjRbtyNhWJQiInbjgNwkGNdkFE1ebSbFjOUatYB0-jd2yQWMWzqkDFFjwMvNl3ZKAr8$> > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fTDOqOTfYZs4FVyI7NuFX2IPcFNkDKfw0tBwg7sqK1df_HIGAzkpZHNBcWjz96Mfb2isyStipMBB1W3-CeTd$ > > <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!djGfJnEhNJROfsMsBJy5u_KoRKbug55xZ64oHKUFnH2cWku_Th1hwt4TDdoMd8pWYVDzJeqJslMNZwpO3y0Et94d31qkNOuenGA$> > > <ksp_petsc_log.txt> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fTDOqOTfYZs4FVyI7NuFX2IPcFNkDKfw0tBwg7sqK1df_HIGAzkpZHNBcWjz96Mfb2isyStipMBB1W3-CeTd$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fTDOqOTfYZs4FVyI7NuFX2IPcFNkDKfw0tBwg7sqK1df_HIGAzkpZHNBcWjz96Mfb2isyStipMBB1Z_tX2fz$ >