Barry's suggestion for testing got garbled in the gitlab issue posting. Here it is, I think:
07:53 main *= ~/Codes/petsc$ make test s=ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1 /usr/local/bin/gmake --no-print-directory -f /Users/markadams/Codes/petsc/gmakefile.test PETSC_ARCH=arch-macosx-gnu-O PETSC_DIR=/Users/markadams/Codes/petsc test Using MAKEFLAGS: --no-print-directory -- PETSC_DIR=/Users/markadams/Codes/petsc PETSC_ARCH=arch-macosx-gnu-O s=ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1 Application at path ( /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec.hydra ) removed from firewall Application at path ( /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec.hydra ) added to firewall Incoming connection to the application is blocked TEST arch-macosx-gnu-O/tests/counts/ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1.counts ok ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1 ok diff-ksp_ksp_tutorials-ex1_mpi_linear_solver_server_1 On Wed, Sep 4, 2024 at 3:59 PM Lin_Yuxiang <linyx199...@gmail.com> wrote: > To whom it may concern: > > > > I recently tried to use the 64 indices PETSc to replace the legacy code's > solver using MPI linear solver server. However, it gives me error when I > use more than 8 cores, saying > > > > Get NNZ > > MatsetPreallocation > > MatsetValue > > MatSetValue Time per kernel: 43.1147 s > > Matassembly > > VecsetValue > > pestc_solve > > Read -1, expected 1951397280, errno = 14 > > > > When I tried the -start_in_debugger, the error seems from MPI_Scatter: > > > > Rank0: > > #3 0x00001555512e4de5 in mca_pml_ob1_recv () from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so > > #4 0x0000155553e01e60 in PMPI_Scatterv () from > /lib/x86_64-linux-gnu/libmpi.so.40 > > #5 0x0000155554b13eab in PCMPISetMat (pc=pc@entry=0x0) at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:230 > > #6 0x0000155554b17403 in PCMPIServerBegin () at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:464 > > #7 0x00001555540b9aa4 in PetscInitialize_Common (prog=0x7fffffffe27b > "geosimtrs_mpiserver", file=file@entry=0x0, > > help=help@entry=0x55555555a1e0 <help> "Solves a linear system in > parallel with KSP.\nInput parameters include:\n -view_exact_sol : write > exact solution vector to stdout\n -m <mesh_x> : number of mesh > points in x-direction\n -n <mesh"..., ftn=ftn@entry=PETSC_FALSE, > readarguments=readarguments@entry=PETSC_FALSE, len=len@entry=0) > > at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1109 > > #8 0x00001555540bba82 in PetscInitialize (argc=argc@entry=0x7fffffffda8c, > args=args@entry=0x7fffffffda80, file=file@entry=0x0, > > help=help@entry=0x55555555a1e0 <help> "Solves a linear system in > parallel with KSP.\nInput parameters include:\n -view_exact_sol : write > exact solution vector to stdout\n -m <mesh_x> : number of mesh > points in x-direction\n -n <mesh"...) at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1274 > > #9 0x0000555555557673 in main (argc=<optimized out>, args=<optimized > out>) at geosimtrs_mpiserver.c:29 > > > > Rank1-10 > > 0x0000155553e1f030 in ompi_coll_base_allgather_intra_bruck () from > /lib/x86_64-linux-gnu/libmpi.so.40 > > #4 0x0000155550f62aaa in ompi_coll_tuned_allgather_intra_dec_fixed () > from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so > > #5 0x0000155553ddb431 in PMPI_Allgather () from > /lib/x86_64-linux-gnu/libmpi.so.40 > > #6 0x00001555541a2289 in PetscLayoutSetUp (map=0x555555721ed0) at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/vec/is/utils/pmap.c:248 > > #7 0x000015555442e06a in MatMPIAIJSetPreallocationCSR_MPIAIJ > (B=0x55555572d850, Ii=0x15545a778010, J=0x15545beacb60, v=0x1554cff55e60) > > at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/mat/impls/aij/mpi/mpiaij.c:3885 > > #8 0x00001555544284e3 in MatMPIAIJSetPreallocationCSR (B=0x55555572d850, > i=0x15545a778010, j=0x15545beacb60, v=0x1554cff55e60) at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/mat/impls/aij/mpi/mpiaij.c:3998 > > #9 0x0000155554b1412f in PCMPISetMat (pc=pc@entry=0x0) at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:250 > > #10 0x0000155554b17403 in PCMPIServerBegin () at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/ksp/pc/impls/mpi/pcmpi.c:464 > > #11 0x00001555540b9aa4 in PetscInitialize_Common (prog=0x7fffffffe27b > "geosimtrs_mpiserver", file=file@entry=0x0, > > help=help@entry=0x55555555a1e0 <help> "Solves a linear system in > parallel with KSP.\nInput parameters include:\n -view_exact_sol : write > exact solution vector to stdout\n -m <mesh_x> : number of mesh > points in x-direction\n -n <mesh"..., ftn=ftn@entry=PETSC_FALSE, > readarguments=readarguments@entry=PETSC_FALSE, len=len@entry=0) at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1109 > > #12 0x00001555540bba82 in PetscInitialize (argc=argc@entry=0x7fffffffda8c, > args=args@entry=0x7fffffffda80, file=file@entry=0x0, > > help=help@entry=0x55555555a1e0 <help> "Solves a linear system in > parallel with KSP.\nInput parameters include:\n -view_exact_sol : write > exact solution vector to stdout\n -m <mesh_x> : number of mesh > points in x-direction\n -n <mesh"...) at > /auto/research/rdfs/home/lyuxiang/petsc-3.20.4/src/sys/objects/pinit.c:1274 > > #13 0x0000555555557673 in main (argc=<optimized out>, args=<optimized > out>) at geosimtrs_mpiserver.c:29 > > > > This did not happen in 32bit indiced PETSc, running with more than 8 cores > runs smoothly using MPI linear solver server, nor did it happen on 64 bit > indiced MPI version (not with mpi_linear_solver_server), only happens on 64 > bit PETSc mpi linear solver server, I think it maybe a potential bug? > > > > Any advice would be greatly appreciated, the matrix and ia, ja is too big > to upload, so if anything you need to debug pls let me know > > > > - > > Machine type: HPC > - > > OS version and type: Linux houamd009 6.1.55-cggdb11-1 #1 SMP Fri Sep > 29 10:09:13 UTC 2023 x86_64 GNU/Linux > - > > PETSc version: #define PETSC_VERSION_RELEASE 1 > #define PETSC_VERSION_MAJOR 3 > #define PETSC_VERSION_MINOR 20 > #define PETSC_VERSION_SUBMINOR 4 > #define PETSC_RELEASE_DATE "Sep 28, 2023" > #define PETSC_VERSION_DATE "Jan 29, 2024" > - > > MPI implementation: OpenMPI > - > > Compiler and version: GNU > > > > Yuxiang Lin >