configure prints the information about CUDA at the end of the run, you can 
check that information to see which was actually used. 

  I have a new MR where PETSc records the gencodearch it was built with and 
then when your program starts up CUDA it verifies that the hardware supports 
the gencodearch it was built with. Hopefully this will alleviate difficulties 
in the future. Of course this won't help when using libraries that use CUDA 
built externally from PETSc.

   Barry


> On May 18, 2021, at 10:30 AM, Junchao Zhang <junchao.zh...@gmail.com> wrote:
> 
>     '--with-cuda-gencodearch=70',
> 
> --Junchao Zhang
> 
> 
> On Tue, May 18, 2021 at 6:29 AM Mark Adams <mfad...@lbl.gov 
> <mailto:mfad...@lbl.gov>> wrote:
> Damn, I am getting this problem on Summit and did a clean configure. 
> I removed the Kokkos arch=70 line and added 
>     '--with-cudac-gencodearch=70',
> 
> Any ideas?
> 
> < Number of SNES iterations = 2
> ---
> > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture
> > [h50n11:35759] *** Process received signal ***
> > [h50n11:35759] Signal: Aborted (6)
> > [h50n11:35759] Signal code:  (-6)
> > [h50n11:35759] [ 0] [0x2000000504d8]
> > [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094]
> > [h50n11:35759] [ 2] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558]
> > [h50n11:35759] [ 3] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210]
> > [h50n11:35759] [ 4] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0]
> > [h50n11:35759] [ 5] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314]
> > [h50n11:35759] [ 6] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0]
> > [h50n11:35759] [ 7] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac]
> > [h50n11:35759] [ 8] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c]
> > [h50n11:35759] [ 9] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c]
> > [h50n11:35759] [10] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424]
> > [h50n11:35759] [11] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc]
> > [h50n11:35759] [12] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4]
> > [h50n11:35759] [13] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790]
> > [h50n11:35759] [14] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24]
> > [h50n11:35759] [15] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504]
> > [h50n11:35759] [16] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c]
> > [h50n11:35759] [17] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c]
> > [h50n11:35759] [18] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560]
> > [h50n11:35759] [19] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0]
> > [h50n11:35759] [20] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10]
> > [h50n11:35759] [21] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584]
> > [h50n11:35759] [22] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4]
> > [h50n11:35759] [23] 
> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44]
> > [h50n11:35759] [24] ./ex19[0x10001a70]
> > [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200]
> > [h50n11:35759] [26] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4]
> > [h50n11:35759] *** End of error message ***
> > ERROR:  One or more process (first noticed rank 0) terminated with signal 6 
> > (core dumped)
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
> 
> On Mon, May 17, 2021 at 8:24 AM Mark Adams <mfad...@lbl.gov 
> <mailto:mfad...@lbl.gov>> wrote:
> I thought I did a clean make but I made a clean one now and it seems to be 
> working now.
> 
> Also, I am trying to fix this error message that I get on Cori with 'make 
> check'.
> I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these 
> parameters, but I get error messages on Kokkos:
> 
> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
> See http://www.mcs.anl.gov/petsc/documentation/faq.html 
> <http://www.mcs.anl.gov/petsc/documentation/faq.html>
> srun: error: Unable to create step for job 1923618: More processors requested 
> than permitted
> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored)
> 1,25c1
> < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000
> < Vec Object: Exact Solution 2 MPI processes
> <   type: mpikokkos
> < Process [0]
> < 0.
> < 0.015625
> < 0.125
> < Process [1]
> < 0.421875
> < 1.
> < Vec Object: Forcing function 2 MPI processes
> <   type: mpikokkos
> < Process [0]
> < 1e-72
> < 1.50024
> < 3.01563
> < Process [1]
> < 4.67798
> < 7.
> <   0 SNES Function norm 5.414682427127e+00 
> <   1 SNES Function norm 2.952582418265e-01 
> <   2 SNES Function norm 4.502293658739e-04 
> <   3 SNES Function norm 1.389665806646e-09 
> < Number of SNES iterations = 3
> < Norm of error 1.49752e-10 Iterations 3
> ---
> > srun: error: Unable to create step for job 1923618: More processors 
> > requested than permitted
> /global/homes/m/madams/petsc/src/snes/tutorials
> Possible problem with ex3k running with kokkos-kernels, diffs above
> =========================================
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
> Completed test examples
> 
> On Sun, May 16, 2021 at 11:14 PM Barry Smith <bsm...@petsc.dev 
> <mailto:bsm...@petsc.dev>> wrote:
> 
> Could still be a gencode arch issue. Is it possible that Kokkos was built 
> with the 80 arch and when you reran configure with 70 it did not rebuild 
> Kokkos because it didn't know it needed to?
> 
> Sorry, but this may require another rm -rf arch* and running ./configure 
> again.
> 
> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e
>  
> <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e>
> 
> 
> cudaErrorInvalidDeviceFunction = 98
> The requested device function does not exist or is not compiled for the 
> proper device architecture.
> 
> 
> 
>> On May 16, 2021, at 9:09 PM, Mark Adams <mfad...@lbl.gov 
>> <mailto:mfad...@lbl.gov>> wrote:
>> 
>> I now get this error. A blas error from VecAXPBYPCZ ...
>> Any ideas?
>> 
>> 
>> terminate called after throwing an instance of 'std::runtime_error'
>>   what():  cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) 
>> error( cudaErrorInvalidDeviceFunction): invalid device function 
>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654
>> Traceback functionality not available
>> 
>> [cgpu16:55192] *** Process received signal ***
>> [cgpu16:55192] Signal: Aborted (6)
>> [cgpu16:55192] Signal code:  (-6)
>> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360]
>> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160]
>> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741]
>> [cgpu16:55192] [ 3] 
>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83]
>> [cgpu16:55192] [ 4] 
>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6]
>> [cgpu16:55192] [ 5] 
>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21]
>> [cgpu16:55192] [ 6] 
>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053]
>> [cgpu16:55192] [ 7] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f]
>> [cgpu16:55192] [ 8] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d]
>> [cgpu16:55192] [ 9] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7]
>> [cgpu16:55192] [10] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1]
>> [cgpu16:55192] [11] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781]
>> [cgpu16:55192] [12] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b]
>> [cgpu16:55192] [13] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1]
>> [cgpu16:55192] [14] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e]
>> [cgpu16:55192] [15] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a]
>> [cgpu16:55192] [16] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675]
>> [cgpu16:55192] [17] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e]
>> [cgpu16:55192] [18] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651]
>> [cgpu16:55192] [19] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c]
>> [cgpu16:55192] [20] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05]
>> [cgpu16:55192] [21] 
>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455]
>> [cgpu16:55192] [22] ../ex2-kok[0x4033eb]
>> [cgpu16:55192] [23] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a]
>> [cgpu16:55192] [24] ../ex2-kok[0x404aaa]
>> [cgpu16:55192] *** End of error message ***
>> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted                
>>  "$@"
>> 0 stopping nvidia-cuda-mps-control on cgpu16
> 

Reply via email to