* '--with-cuda-gencodearch=70',* --Junchao Zhang
On Tue, May 18, 2021 at 6:29 AM Mark Adams <mfad...@lbl.gov> wrote: > Damn, I am getting this problem on Summit and did a clean configure. > I removed the Kokkos arch=70 line and added > '--with-cudac-gencodearch=70', > > Any ideas? > > < Number of SNES iterations = 2 > --- > > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > > [h50n11:35759] *** Process received signal *** > > [h50n11:35759] Signal: Aborted (6) > > [h50n11:35759] Signal code: (-6) > > [h50n11:35759] [ 0] [0x2000000504d8] > > [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094] > > [h50n11:35759] [ 2] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558] > > [h50n11:35759] [ 3] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210] > > [h50n11:35759] [ 4] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0] > > [h50n11:35759] [ 5] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314] > > [h50n11:35759] [ 6] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0] > > [h50n11:35759] [ 7] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac] > > [h50n11:35759] [ 8] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c] > > [h50n11:35759] [ 9] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c] > > [h50n11:35759] [10] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424] > > [h50n11:35759] [11] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc] > > [h50n11:35759] [12] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4] > > [h50n11:35759] [13] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790] > > [h50n11:35759] [14] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24] > > [h50n11:35759] [15] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504] > > [h50n11:35759] [16] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c] > > [h50n11:35759] [17] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c] > > [h50n11:35759] [18] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560] > > [h50n11:35759] [19] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0] > > [h50n11:35759] [20] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10] > > [h50n11:35759] [21] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584] > > [h50n11:35759] [22] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4] > > [h50n11:35759] [23] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44] > > [h50n11:35759] [24] ./ex19[0x10001a70] > > [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200] > > [h50n11:35759] [26] > /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4] > > [h50n11:35759] *** End of error message *** > > ERROR: One or more process (first noticed rank 0) terminated with > signal 6 (core dumped) > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials > > On Mon, May 17, 2021 at 8:24 AM Mark Adams <mfad...@lbl.gov> wrote: > >> I thought I did a clean make but I made a clean one now and it seems to >> be working now. >> >> Also, I am trying to fix this error message that I get on Cori with 'make >> check'. >> I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these >> parameters, but I get error messages on Kokkos: >> >> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes >> See http://www.mcs.anl.gov/petsc/documentation/faq.html >> >> *srun: error: Unable to create step for job 1923618: More processors >> requested than permitted*C/C++ example src/snes/tutorials/ex19 run >> successfully with cuda >> gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) >> 1,25c1 >> < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000 >> < Vec Object: Exact Solution 2 MPI processes >> < type: mpikokkos >> < Process [0] >> < 0. >> < 0.015625 >> < 0.125 >> < Process [1] >> < 0.421875 >> < 1. >> < Vec Object: Forcing function 2 MPI processes >> < type: mpikokkos >> < Process [0] >> < 1e-72 >> < 1.50024 >> < 3.01563 >> < Process [1] >> < 4.67798 >> < 7. >> < 0 SNES Function norm 5.414682427127e+00 >> < 1 SNES Function norm 2.952582418265e-01 >> < 2 SNES Function norm 4.502293658739e-04 >> < 3 SNES Function norm 1.389665806646e-09 >> < Number of SNES iterations = 3 >> < Norm of error 1.49752e-10 Iterations 3 >> --- >> >> *> srun: error: Unable to create step for job 1923618: More processors >> requested than permitted*/global/homes/m/madams/petsc/src/snes/tutorials >> Possible problem with ex3k running with kokkos-kernels, diffs above >> ========================================= >> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI >> process >> Completed test examples >> >> On Sun, May 16, 2021 at 11:14 PM Barry Smith <bsm...@petsc.dev> wrote: >> >>> >>> Could still be a gencode arch issue. Is it possible that Kokkos was >>> built with the 80 arch and when you reran configure with 70 it did not >>> rebuild Kokkos because it didn't know it needed to? >>> >>> Sorry, but this may require another rm -rf arch* and running ./configure >>> again. >>> >>> >>> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e >>> >>> >>> cudaErrorInvalidDeviceFunction = 98The requested device function does >>> not exist or is not compiled for the proper device architecture. >>> >>> >>> >>> On May 16, 2021, at 9:09 PM, Mark Adams <mfad...@lbl.gov> wrote: >>> >>> I now get this error. A blas error from VecAXPBYPCZ ... >>> Any ideas? >>> >>> >>> terminate called after throwing an instance of 'std::runtime_error' >>> what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) >>> error( cudaErrorInvalidDeviceFunction): invalid device function >>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654 >>> Traceback functionality not available >>> >>> [cgpu16:55192] *** Process received signal *** >>> [cgpu16:55192] Signal: Aborted (6) >>> [cgpu16:55192] Signal code: (-6) >>> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360] >>> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160] >>> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741] >>> [cgpu16:55192] [ 3] >>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83] >>> [cgpu16:55192] [ 4] >>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6] >>> [cgpu16:55192] [ 5] >>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21] >>> [cgpu16:55192] [ 6] >>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053] >>> [cgpu16:55192] [ 7] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f] >>> [cgpu16:55192] [ 8] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d] >>> [cgpu16:55192] [ 9] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7] >>> [cgpu16:55192] [10] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1] >>> [cgpu16:55192] [11] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781] >>> [cgpu16:55192] [12] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b] >>> [cgpu16:55192] [13] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1] >>> [cgpu16:55192] [14] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e] >>> [cgpu16:55192] [15] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a] >>> [cgpu16:55192] [16] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675] >>> [cgpu16:55192] [17] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e] >>> [cgpu16:55192] [18] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651] >>> [cgpu16:55192] [19] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c] >>> [cgpu16:55192] [20] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05] >>> [cgpu16:55192] [21] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455] >>> [cgpu16:55192] [22] ../ex2-kok[0x4033eb] >>> [cgpu16:55192] [23] >>> /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a] >>> [cgpu16:55192] [24] ../ex2-kok[0x404aaa] >>> [cgpu16:55192] *** End of error message *** >>> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted >>> "$@" >>> 0 stopping nvidia-cuda-mps-control on cgpu16 >>> >>> >>>