Please include the output of -log_view -ksp_view -ksp_monitor to understand what's happening.
Can you please share the equations you are solving so we can provide suggestions on the solver configuration? As I said, solving for Nedelec-type discretizations is challenging, and not for off-the-shelf, black box solvers Below are some comments: - You use a redundant SVD approach for the coarse solve, which can be inefficient if your coarse space grows. You can use a parallel direct solver like MUMPS (reconfigure with --download-mumps and use -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps) - Why use ILU for the Dirichlet problem and GAMG for the Neumann problem? With 8 processes and 300K total dofs, you will have around 40K dofs per process, which is ok for a direct solver like MUMPS (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). With Nedelec dofs and the sparsity pattern they induce, I believe you can push to 80K dofs per process with good performance. - Why 5000 of restart for GMRES? It is highly inefficient to re-orthogonalize such a large set of vectors. Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufi...@gmail.com> ha scritto: > Dear Petsc developers, > > Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with, > > petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc > -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged > -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view > -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu > -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 > -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view > > Then I used 2 cases for strong scaling test. One case only involves real > numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd > case involves complex numbers (tetra #: 95,336; dof #: 611,432) due to > PML. > > Case 1: > cpu # Time for 500 ksp steps (s) Parallel efficiency > PCsetup time(s) > 2 234.7 > 3.12 > 4 126.6 0.92 > 1.62 > 8 84.97 0.69 > 1.26 > However for Case 2, > cpu # Time for 500 ksp steps (s) Parallel efficiency > PCsetup time(s) > 2 584.5 > 8.61 > 4 376.8 0.77 > 6.56 > 8 459.6 0.31 > 66.47 > For these 2 cases, I checked the time for PCsetup as an example. It seems > 8 cpus for case 2 used too much time on PCsetup. > Do you have any ideas about what is going on here? > > Thanks, > Xiaodong > > > -- Stefano