Let me run some examples on our end to see whether the code calls expected functions.
--Junchao Zhang On Mon, Jun 24, 2024 at 10:46 AM Matthew Knepley <knep...@gmail.com> wrote: > On Mon, Jun 24, 2024 at 11: 21 AM Yongzhong Li <yongzhong. li@ mail. > utoronto. ca> wrote: Thank you Pierre for your information. Do we have a > conclusion for my original question about the parallelization efficiency > for different stages of > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > On Mon, Jun 24, 2024 at 11:21 AM Yongzhong Li < > yongzhong...@mail.utoronto.ca> wrote: > >> Thank you Pierre for your information. Do we have a conclusion for my >> original question about the parallelization efficiency for different stages >> of KSP Solve? Do we need to do more testing to figure out the issues? Thank >> you, Yongzhong From: >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> >> Thank you Pierre for your information. Do we have a conclusion for my >> original question about the parallelization efficiency for different stages >> of KSP Solve? Do we need to do more testing to figure out the issues? >> > > We have an extended discussion of this here: > https://urldefense.us/v3/__https://petsc.org/release/faq/*what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup__;Iw!!G_uCfscf7eWS!cLhz5s-64hAO57C-mkJR6i1W0OTVMfuqLUI6QANOVXoIUOwQ8waTFW5X2F7uFVctLsjLYyJAjSU7_bwnkxdG7P10iuoI$ > > <https://urldefense.us/v3/__https://petsc.org/release/faq/*what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup__;Iw!!G_uCfscf7eWS!aQJpmm5W6l6FUiumnIPmkouzwzNUfx-Dyq04i1O2KS_InQGk6qjI7wUir0Hx6QEUQE2AMiJDsez3x4zRO7V_$> > > The kinds of operations you are talking about (SpMV, VecDot, VecAXPY, etc) > are memory bandwidth limited. If there is no more bandwidth to be > marshalled on your board, then adding more processes does nothing at all. > This is why people were asking about how many "nodes" you are running on, > because that is the unit of memory bandwidth, not "cores" which make little > difference. > > Thanks, > > Matt > > >> Thank you, >> >> Yongzhong >> >> >> >> *From: *Pierre Jolivet <pie...@joliv.et> >> *Date: *Sunday, June 23, 2024 at 12:41 AM >> *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> >> *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> >> *Subject: *Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc >> KSPSolve Performance Issue >> >> >> >> >> >> On 23 Jun 2024, at 4:07 AM, Yongzhong Li <yongzhong...@mail.utoronto.ca> >> wrote: >> >> >> >> This Message Is From an External Sender >> >> This message came from outside your organization. >> >> Yeah, I ran my program again using -mat_view::ascii_info and set >> MKL_VERBOSE to be 1, then I noticed the outputs suggested that the matrix >> to be seqaijmkl type (I’ve attached a few as below) >> >> --> Setting up matrix-vector products... >> >> >> >> Mat Object: 1 MPI process >> >> type: seqaijmkl >> >> rows=16490, cols=35937 >> >> total: nonzeros=128496, allocated nonzeros=128496 >> >> total number of mallocs used during MatSetValues calls=0 >> >> not using I-node routines >> >> Mat Object: 1 MPI process >> >> type: seqaijmkl >> >> rows=16490, cols=35937 >> >> total: nonzeros=128496, allocated nonzeros=128496 >> >> total number of mallocs used during MatSetValues calls=0 >> >> not using I-node routines >> >> >> >> --> Solving the system... >> >> >> >> Excitation 1 of 1... >> >> >> >> ================================================ >> >> Iterative solve completed in 7435 ms. >> >> CONVERGED: rtol. >> >> Iterations: 72 >> >> Final relative residual norm: 9.22287e-07 >> >> ================================================ >> >> [CPU TIME] System solution: 2.27160000e+02 s. >> >> [WALL TIME] System solution: 7.44387218e+00 s. >> >> However, it seems to me that there were still no MKL outputs even I set >> MKL_VERBOSE to be 1. Although, I think it should be many spmv operations >> when doing KSPSolve(). Do you see the possible reasons? >> >> >> >> SPMV are not reported with MKL_VERBOSE (last I checked), only dense BLAS >> is. >> >> >> >> Thanks, >> >> Pierre >> >> >> >> Thanks, >> >> Yongzhong >> >> >> >> >> >> *From: *Matthew Knepley <knep...@gmail.com> >> *Date: *Saturday, June 22, 2024 at 5:56 PM >> *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> >> *Cc: *Junchao Zhang <junchao.zh...@gmail.com>, Pierre Jolivet < >> pie...@joliv.et>, petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> >> *Subject: *Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc >> KSPSolve Performance Issue >> >> 你通常不会收到来自 knep...@gmail.com 的电子邮件。了解这一点为什么很重要 >> <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!fVvbGldqcUV5ju4jpu5oGmt-VjITi5JpCJzhHxpbgsERLVYZzglpxKOOyrBRGxjRxp7vWHwt3SnINFOQErR1Z8kcDcf3qwbYRxM$> >> >> On Sat, Jun 22, 2024 at 5:03 PM Yongzhong Li < >> yongzhong...@mail.utoronto.ca> wrote: >> >> MKL_VERBOSE=1 ./ex1 matrix nonzeros = 100, allocated nonzeros = 100 >> MKL_VERBOSE Intel(R) MKL 2019. 0 Update 4 Product build 20190411 for >> Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) >> AVX-512) with support of Vector >> >> ZjQcmQRYFpfptBannerStart >> >> *This Message Is From an External Sender* >> >> This message came from outside your organization. >> >> >> >> ZjQcmQRYFpfptBannerEnd >> >> MKL_VERBOSE=1 ./ex1 >> >> >> matrix nonzeros = 100, allocated nonzeros = 100 >> >> MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for >> Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) >> AVX-512) with support of Vector Neural Network Instructions enabled >> processors, Lnx 2.50GHz lp64 gnu_thread >> >> MKL_VERBOSE >> ZGEMV(N,10,10,0x7ffd9d7078f0,0x187eb20,10,0x187f7c0,1,0x7ffd9d707900,0x187ff70,1) >> 167.34ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x7ffd9d7078c0,-1,0) >> 77.19ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x1894490,10,0) 83.97ms >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZSYTRS(L,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 44.94ms >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 20.72us >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZSYTRS(L,10,2,0x1894b50,10,0x1893df0,0x187d2a0,10,0) 4.22us >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZGEMM(N,N,10,2,10,0x7ffd9d707790,0x187eb20,10,0x187d2a0,10,0x7ffd9d7077a0,0x1896a70,10) >> 1.41ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x1896a70,1,0x187b650,1) 381ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x7ffd9d707840,-1,0) 742ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZSYTRF(L,10,0x1894b50,10,0x1893df0,0x18951a0,10,0) 4.20us >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZSYTRS(L,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 2.94us >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 292ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZGEMV(N,10,10,0x7ffd9d7078f0,0x187eb20,10,0x187f7c0,1,0x7ffd9d707900,0x187ff70,1) >> 1.17us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGETRF(10,10,0x1894b50,10,0x1893df0,0) 202.48ms CNR:OFF Dyn:1 >> FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGETRS(N,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 20.78ms >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 954ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGETRS(N,10,2,0x1894b50,10,0x1893df0,0x187d2a0,10,0) 30.74ms >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZGEMM(N,N,10,2,10,0x7ffd9d707790,0x187eb20,10,0x187d2a0,10,0x7ffd9d7077a0,0x18969c0,10) >> 3.95us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x18969c0,1,0x187b650,1) 995ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGETRF(10,10,0x1894b50,10,0x1893df0,0) 4.09us CNR:OFF Dyn:1 >> FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGETRS(N,10,1,0x1894b50,10,0x1893df0,0x1880720,10,0) 3.92us >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187f7c0,1,0x1880720,1) 274ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZGEMV(N,15,10,0x7ffd9d7078f0,0x187ec70,15,0x187fc30,1,0x7ffd9d707900,0x1880400,1) >> 1.59us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x1894550,0x7ffd9d707900,-1,0) >> 47.07us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x1894550,0x1895cb0,10,0) 26.62us >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZUNMQR(L,C,15,1,10,0x1894b40,15,0x1894550,0x1895b00,15,0x7ffd9d7078b0,-1,0) >> 35.32us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZUNMQR(L,C,15,1,10,0x1894b40,15,0x1894550,0x1895b00,15,0x1895cb0,10,0) >> 42.33ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZTRTRS(U,N,N,10,1,0x1894b40,15,0x1895b00,15,0) 16.11us >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187fc30,1,0x1880c70,1) 395ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZGEMM(N,N,15,2,10,0x7ffd9d707790,0x187ec70,15,0x187d310,10,0x7ffd9d7077a0,0x187b5b0,15) >> 3.22us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZUNMQR(L,C,15,2,10,0x1894b40,15,0x1894550,0x1897760,15,0x7ffd9d7078c0,-1,0) >> 730ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZUNMQR(L,C,15,2,10,0x1894b40,15,0x1894550,0x1897760,15,0x1895cb0,10,0) >> 4.42us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZTRTRS(U,N,N,10,2,0x1894b40,15,0x1897760,15,0) 5.96us CNR:OFF >> Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(20,0x7ffd9d7078a0,0x187d310,1,0x1897610,1) 222ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x18954b0,0x7ffd9d707820,-1,0) >> 685ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZGEQRF(15,10,0x1894b40,15,0x18954b0,0x1895d60,10,0) 6.11us >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZUNMQR(L,C,15,1,10,0x1894b40,15,0x18954b0,0x1895bb0,15,0x7ffd9d7078b0,-1,0) >> 390ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE >> ZUNMQR(L,C,15,1,10,0x1894b40,15,0x18954b0,0x1895bb0,15,0x1895d60,10,0) >> 3.09us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZTRTRS(U,N,N,10,1,0x1894b40,15,0x1895bb0,15,0) 1.05us CNR:OFF >> Dyn:1 FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE ZAXPY(10,0x7ffd9d7078f0,0x187fc30,1,0x1880c70,1) 257ns >> CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1 >> >> Yes, for petsc example, there are MKL outputs, but for my own program. >> All I did is to change the matrix type from MATAIJ to MATAIJMKL to get >> optimized performance for spmv from MKL. Should I expect to see any MKL >> outputs in this case? >> >> >> >> Are you sure that the type changed? You can MatView() the matrix with >> format ascii_info to see. >> >> >> >> Thanks, >> >> >> >> Matt >> >> >> >> >> >> Thanks, >> >> Yongzhong >> >> >> >> *From: *Junchao Zhang <junchao.zh...@gmail.com> >> *Date: *Saturday, June 22, 2024 at 9:40 AM >> *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> >> *Cc: *Pierre Jolivet <pie...@joliv.et>, petsc-users@mcs.anl.gov < >> petsc-users@mcs.anl.gov> >> *Subject: *Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc >> KSPSolve Performance Issue >> >> No, you don't. It is strange. Perhaps you can you run a petsc example >> first and see if MKL is really used >> >> $ cd src/mat/tests >> >> $ make ex1 >> >> $ MKL_VERBOSE=1 ./ex1 >> >> >> --Junchao Zhang >> >> >> >> >> >> On Fri, Jun 21, 2024 at 4:03 PM Yongzhong Li < >> yongzhong...@mail.utoronto.ca> wrote: >> >> I am using >> >> export MKL_VERBOSE=1 >> >> ./xx >> >> in the bash file, do I have to use - ksp_converged_reason? >> >> Thanks, >> >> Yongzhong >> >> >> >> *From: *Pierre Jolivet <pie...@joliv.et> >> *Date: *Friday, June 21, 2024 at 1:47 PM >> *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> >> *Cc: *Junchao Zhang <junchao.zh...@gmail.com>, petsc-users@mcs.anl.gov < >> petsc-users@mcs.anl.gov> >> *Subject: *Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc >> KSPSolve Performance Issue >> >> 你通常不会收到来自 pie...@joliv.et 的电子邮件。了解这一点为什么很重要 >> <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!flsZMI97ne0yyxHhLda3hROB9qsgstuZS-jPinxGIzFCCSdn1ujdoMR8dyz-5_kVqqMM-12Lt0dTdjKrx3wXhHZmBhNydvFQeSY$> >> >> How do you set the variable? >> >> >> >> $ MKL_VERBOSE=1 ./ex1 -ksp_converged_reason >> >> MKL_VERBOSE oneMKL 2024.0 Update 1 Product build 20240215 for Intel(R) 64 >> architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled >> processors, Lnx 2.80GHz lp64 intel_thread >> >> MKL_VERBOSE DDOT(10,0x22127c0,1,0x22127c0,1) 2.02ms CNR:OFF Dyn:1 >> FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE DSCAL(10,0x7ffc9fb4ff08,0x22127c0,1) 12.67us CNR:OFF Dyn:1 >> FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE DDOT(10,0x22127c0,1,0x2212840,1) 1.52us CNR:OFF Dyn:1 >> FastMM:1 TID:0 NThr:1 >> >> MKL_VERBOSE DDOT(10,0x2212840,1,0x2212840,1) 167ns CNR:OFF Dyn:1 FastMM:1 >> TID:0 NThr:1 >> >> [...] >> >> >> >> On 21 Jun 2024, at 7:37 PM, Yongzhong Li <yongzhong...@mail.utoronto.ca> >> wrote: >> >> >> >> This Message Is From an External Sender >> >> This message came from outside your organization. >> >> Hello all, >> >> I set MKL_VERBOSE = 1, but observed no print output specific to the use >> of MKL. Does PETSc enable this verbose output? >> >> Best, >> >> Yongzhong >> >> >> >> *From: *Pierre Jolivet <pie...@joliv.et> >> *Date: *Friday, June 21, 2024 at 1:36 AM >> *To: *Junchao Zhang <junchao.zh...@gmail.com> >> *Cc: *Yongzhong Li <yongzhong...@mail.utoronto.ca>, >> petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> >> *Subject: *Re: [petsc-users] [petsc-maint] Assistance Needed with PETSc >> KSPSolve Performance Issue >> >> 你通常不会收到来自 pie...@joliv.et 的电子邮件。了解这一点为什么很重要 >> <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!eXBeeIXo9Yqgp2nypqwKYimLnGBZXnF4dXxgLM1UoOIO6n8nt3XlfgjVWLPWJh4UOa5NNpx-nrJb_H828XRQKUREfR2m69oCbxI$> >> >> >> >> >> >> On 21 Jun 2024, at 6:42 AM, Junchao Zhang <junchao.zh...@gmail.com> >> wrote: >> >> >> >> This Message Is From an External Sender >> >> This message came from outside your organization. >> >> I remember there are some MKL env vars to print MKL routines called. >> >> >> >> The environment variable is MKL_VERBOSE >> >> >> >> Thanks, >> >> Pierre >> >> >> >> Maybe we can try it to see what MKL routines are really used and then we >> can understand why some petsc functions did not speed up >> >> >> --Junchao Zhang >> >> >> >> >> >> On Thu, Jun 20, 2024 at 10:39 PM Yongzhong Li < >> yongzhong...@mail.utoronto.ca> wrote: >> >> *This Message Is From an External Sender* >> >> This message came from outside your organization. >> >> >> >> Hi Barry, sorry for my last results. I didn’t fully understand the stage >> profiling and logging in PETSc, now I only record KSPSolve() stage of my >> program. Some sample codes are as follow, >> >> // Static variable to keep track of the stage counter >> >> static int stageCounter = 1; >> >> >> >> // Generate a unique stage name >> >> std::ostringstream oss; >> >> oss << "Stage " << stageCounter << " of Code"; >> >> std::string stageName = oss.str(); >> >> >> >> // Register the stage >> >> PetscLogStage stagenum; >> >> >> >> PetscLogStageRegister(stageName.c_str(), &stagenum); >> >> PetscLogStagePush(stagenum); >> >> >> >> *KSPSolve(*ksp_ptr, b, x);* >> >> >> >> PetscLogStagePop(); >> >> stageCounter++; >> >> I have attached my new logging results, there are 1 main stage and 4 >> other stages where each one is KSPSolve() call. >> >> To provide some additional backgrounds, if you recall, I have been trying >> to get efficient iterative solution using multithreading. I found out by >> compiling PETSc with Intel MKL library instead of OpenBLAS, I am able to >> perform sparse matrix-vector multiplication faster, I am using >> MATSEQAIJMKL. This makes the shell matrix vector product in each iteration >> scale well with the #of threads. However, I found out the total GMERS solve >> time (~KSPSolve() time) is not scaling well the #of threads. >> >> From the logging results I learned that when performing KSPSolve(), there >> are some CPU overheads in PCApply() and KSPGMERSOrthog(). I ran my programs >> using different number of threads and plotted the time consumption for >> PCApply() and KSPGMERSOrthog() against #of thread. I found out these two >> operations are not scaling with the threads at all! My results are attached >> as the pdf to give you a clear view. >> >> My questions is, >> >> From my understanding, in PCApply, MatSolve() is involved, >> KSPGMERSOrthog() will have many vector operations, so why these two parts >> can’t scale well with the # of threads when the intel MKL library is linked? >> >> Thank you, >> Yongzhong >> >> >> >> *From: *Barry Smith <bsm...@petsc.dev> >> *Date: *Friday, June 14, 2024 at 11:36 AM >> *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> >> *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, >> petsc-ma...@mcs.anl.gov <petsc-ma...@mcs.anl.gov>, Piero Triverio < >> piero.trive...@utoronto.ca> >> *Subject: *Re: [petsc-maint] Assistance Needed with PETSc KSPSolve >> Performance Issue >> >> >> >> I am a bit confused. Without the initial guess computation, there are >> still a bunch of events I don't understand >> >> >> >> MatTranspose 79 1.0 4.0598e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >> MatMatMultSym 110 1.0 1.7419e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> >> MatMatMultNum 90 1.0 1.2640e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> >> MatMatMatMultSym 20 1.0 1.3049e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> >> MatRARtSym 25 1.0 1.2492e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> >> MatMatTrnMultSym 25 1.0 8.8265e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >> MatMatTrnMultNum 25 1.0 2.4820e+02 1.0 6.83e+10 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 275 >> >> MatTrnMatMultSym 10 1.0 7.2984e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >> MatTrnMatMultNum 10 1.0 9.3128e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >> >> >> in addition there are many more VecMAXPY then VecMDot (in GMRES they are >> each done the same number of times) >> >> >> >> VecMDot 5588 1.0 1.7183e+03 1.0 2.06e+13 1.0 0.0e+00 0.0e+00 >> 0.0e+00 8 10 0 0 0 8 10 0 0 0 12016 >> >> VecMAXPY 22412 1.0 8.4898e+03 1.0 4.17e+13 1.0 0.0e+00 0.0e+00 >> 0.0e+00 39 20 0 0 0 39 20 0 0 0 4913 >> >> >> >> Finally there are a huge number of >> >> >> >> MatMultAdd 258048 1.0 1.4178e+03 1.0 6.10e+13 1.0 0.0e+00 0.0e+00 >> 0.0e+00 7 29 0 0 0 7 29 0 0 0 43025 >> >> >> >> Are you making calls to all these routines? Are you doing this inside >> your MatMult() or before you call KSPSolve? >> >> >> >> The reason I wanted you to make a simpler run without the initial guess >> code is that your events are far more complicated than would be produced by >> GMRES alone so it is not possible to understand the behavior you are seeing >> without fully understanding all the events happening in the code. >> >> >> >> Barry >> >> >> >> >> >> On Jun 14, 2024, at 1:19 AM, Yongzhong Li <yongzhong...@mail.utoronto.ca> >> wrote: >> >> >> >> Thanks, I have attached the results without using any KSPGuess. At low >> frequency, the iteration steps are quite close to the one with KSPGuess, >> specifically >> >> KSPGuess Object: 1 MPI process >> >> type: fischer >> >> Model 1, size 200 >> >> However, I found at higher frequency, the # of iteration steps are >> significant higher than the one with KSPGuess, I have attahced both of the >> results for your reference. >> >> Moreover, could I ask why the one without the KSPGuess options can be >> used for a baseline comparsion? What are we comparing here? How does it >> relate to the performance issue/bottleneck I found? “*I have noticed >> that the time taken by **KSPSolve** is **almost two times **greater than >> the CPU time for matrix-vector product multiplied by the number of >> iteration*” >> >> Thank you! >> Yongzhong >> >> >> >> *From: *Barry Smith <bsm...@petsc.dev> >> *Date: *Thursday, June 13, 2024 at 2:14 PM >> *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> >> *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, >> petsc-ma...@mcs.anl.gov <petsc-ma...@mcs.anl.gov>, Piero Triverio < >> piero.trive...@utoronto.ca> >> *Subject: *Re: [petsc-maint] Assistance Needed with PETSc KSPSolve >> Performance Issue >> >> >> >> Can you please run the same thing without the KSPGuess option(s) for >> a baseline comparison? >> >> >> >> Thanks >> >> >> >> Barry >> >> >> >> On Jun 13, 2024, at 1:27 PM, Yongzhong Li <yongzhong...@mail.utoronto.ca> >> wrote: >> >> >> >> This Message Is From an External Sender >> >> This message came from outside your organization. >> >> Hi Matt, >> >> I have rerun the program with the keys you provided. The system output >> when performing ksp solve and the final petsc log output were stored in a >> .txt file attached for your reference. >> >> Thanks! >> Yongzhong >> >> >> >> *From: *Matthew Knepley <knep...@gmail.com> >> *Date: *Wednesday, June 12, 2024 at 6:46 PM >> *To: *Yongzhong Li <yongzhong...@mail.utoronto.ca> >> *Cc: *petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>, >> petsc-ma...@mcs.anl.gov <petsc-ma...@mcs.anl.gov>, Piero Triverio < >> piero.trive...@utoronto.ca> >> *Subject: *Re: [petsc-maint] Assistance Needed with PETSc KSPSolve >> Performance Issue >> >> 你通常不会收到来自 knep...@gmail.com 的电子邮件。了解这一点为什么很重要 >> <https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!djGfJnEhNJROfsMsBJy5u_KoRKbug55xZ64oHKUFnH2cWku_Th1hwt4TDdoMd8pWYVDzJeqJslMNZwpO3y0Et94d31qk-oCEwo4$> >> >> On Wed, Jun 12, 2024 at 6:36 PM Yongzhong Li < >> yongzhong...@mail.utoronto.ca> wrote: >> >> Dear PETSc’s developers, I hope this email finds you well. I am currently >> working on a project using PETSc and have encountered a performance issue >> with the KSPSolve function. Specifically, I have noticed that the time >> taken by KSPSolve is >> >> ZjQcmQRYFpfptBannerStart >> >> *This Message Is From an External Sender* >> >> This message came from outside your organization. >> >> >> >> ZjQcmQRYFpfptBannerEnd >> >> Dear PETSc’s developers, >> >> I hope this email finds you well. >> >> I am currently working on a project using PETSc and have encountered a >> performance issue with the KSPSolve function. Specifically, *I have >> noticed that the time taken by **KSPSolve** is **almost two times **greater >> than the CPU time for matrix-vector product multiplied by the number of >> iteration steps*. I use C++ chrono to record CPU time. >> >> For context, I am using a shell system matrix A. Despite my efforts to >> parallelize the matrix-vector product (Ax), the overall solve time >> remains higher than the matrix vector product per iteration indicates >> when multiple threads were used. Here are a few details of my setup: >> >> - *Matrix Type*: Shell system matrix >> - *Preconditioner*: Shell PC >> - *Parallel Environment*: Using Intel MKL as PETSc’s BLAS/LAPACK >> library, multithreading is enabled >> >> I have considered several potential reasons, such as preconditioner >> setup, additional solver operations, and the inherent overhead of using a >> shell system matrix. *However, since KSPSolve is a high-level API, I >> have been unable to pinpoint the exact cause of the increased solve time.* >> >> Have you observed the same issue? Could you please provide some >> experience on how to diagnose and address this performance discrepancy? >> Any insights or recommendations you could offer would be greatly >> appreciated. >> >> >> >> For any performance question like this, we need to see the output of your >> code run with >> >> >> >> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view >> >> >> >> Thanks, >> >> >> >> Matt >> >> >> >> Thank you for your time and assistance. >> >> Best regards, >> >> Yongzhong >> >> ----------------------------------------------------------- >> >> *Yongzhong Li* >> >> PhD student | Electromagnetics Group >> >> Department of Electrical & Computer Engineering >> >> University of Toronto >> >> https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!cLhz5s-64hAO57C-mkJR6i1W0OTVMfuqLUI6QANOVXoIUOwQ8waTFW5X2F7uFVctLsjLYyJAjSU7_bwnkxdG7PGFx3_7$ >> >> <https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!cuLttMJEcegaqu461Bt4QLsO4fASfLM5vjRbtyNhWJQiInbjgNwkGNdkFE1ebSbFjOUatYB0-jd2yQWMWzqkDFFjwMvNl3ZKAr8$> >> >> >> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cLhz5s-64hAO57C-mkJR6i1W0OTVMfuqLUI6QANOVXoIUOwQ8waTFW5X2F7uFVctLsjLYyJAjSU7_bwnkxdG7EU47BCC$ >> >> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!djGfJnEhNJROfsMsBJy5u_KoRKbug55xZ64oHKUFnH2cWku_Th1hwt4TDdoMd8pWYVDzJeqJslMNZwpO3y0Et94d31qkNOuenGA$> >> >> <ksp_petsc_log.txt> >> >> >> >> <ksp_petsc_log.txt><ksp_petsc_log_noguess.txt> >> >> >> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cLhz5s-64hAO57C-mkJR6i1W0OTVMfuqLUI6QANOVXoIUOwQ8waTFW5X2F7uFVctLsjLYyJAjSU7_bwnkxdG7EU47BCC$ >> >> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fVvbGldqcUV5ju4jpu5oGmt-VjITi5JpCJzhHxpbgsERLVYZzglpxKOOyrBRGxjRxp7vWHwt3SnINFOQErR1Z8kcDcf3cNeD9Gw$> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cLhz5s-64hAO57C-mkJR6i1W0OTVMfuqLUI6QANOVXoIUOwQ8waTFW5X2F7uFVctLsjLYyJAjSU7_bwnkxdG7EU47BCC$ > > <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQJpmm5W6l6FUiumnIPmkouzwzNUfx-Dyq04i1O2KS_InQGk6qjI7wUir0Hx6QEUQE2AMiJDsez3x2Os2C2d$> >