I finally figured out a way to make it work. I had to build PETSc and my application using the (non GPU-aware) Intel MPI. Then, before running, I switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only way I've found to compile and run successfully without throwing any errors about not having a GPU-aware MPI.
On Fri, Dec 8, 2023 at 5:30 PM Mark Adams <mfad...@lbl.gov> wrote: > You may need to set some env variables. This can be system specific so you > might want to look at docs or ask TACC how to run with GPU-aware MPI. > > Mark > > On Fri, Dec 8, 2023 at 5:17 PM Sreeram R Venkat <srven...@utexas.edu> > wrote: > >> Actually, when I compile my program with this build of PETSc and run, I >> still get the error: >> >> PETSC ERROR: PETSc is configured with GPU support, but your MPI is not >> GPU-aware. For better performance, please use a GPU-aware MPI. >> >> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >> >> Is there anything else I need to do? >> >> Thanks, >> Sreeram >> >> On Fri, Dec 8, 2023 at 3:29 PM Sreeram R Venkat <srven...@utexas.edu> >> wrote: >> >>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>> >>> On Fri, Dec 8, 2023 at 1:15 PM Satish Balay <ba...@mcs.anl.gov> wrote: >>> >>>> Executing: mpicc -show >>>> stdout: icc -I/opt/apps/cuda/11.4/include -I/opt/apps/cuda/11.4/include >>>> -lcuda -L/opt/apps/cuda/11.4/lib64/stubs -L/opt/apps/cuda/11.4/lib64 >>>> -lcudart -lrt -Wl,-rpath,/opt/apps/cuda/11.4/lib64 >>>> -Wl,-rpath,XORIGIN/placeholder -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ >>>> -lm -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>> >>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>> >>>> Looks like you are trying to mix in 2 different cuda versions in this >>>> build. >>>> >>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>> >>>> Satish >>>> >>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>> >>>> > On Fri, Dec 8, 2023 at 1:54 PM Sreeram R Venkat <srven...@utexas.edu> >>>> wrote: >>>> > >>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>> MVAPICH2-GDR. >>>> > > >>>> > > Here is my configure command: >>>> > > >>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr --download-hypre >>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>> --download-metis >>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>> --with-fc=mpif90 >>>> > > >>>> > > which errors with: >>>> > > >>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>>> for >>>> > > details): >>>> > > >>>> > > >>>> --------------------------------------------------------------------------------------------- >>>> > > CUDA compile failed with arch flags " -ccbin mpic++ -std=c++14 >>>> > > -Xcompiler -fPIC >>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>> > > arch=compute_80,code=sm_80" >>>> > > generated from "--with-cuda-arch=80" >>>> > > >>>> > > >>>> > > >>>> > > The same configure command works when I use the Intel MPI and I can >>>> build >>>> > > with CUDA. The full config.log file is attached. Please let me know >>>> if you >>>> > > need any other information. I appreciate your help with this. >>>> > > >>>> > >>>> > The proximate error is >>>> > >>>> > Executing: nvcc -c -o >>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>> > -I/tmp/petsc-kn3f29gl/config.types >>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ -std=c++14 >>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>> > arch=compute_80,code=sm_80 /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>> > conftest.cu >>>> > stdout: >>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more than >>>> one >>>> > instance of overloaded function "__nv_associate_access_property_impl" >>>> has >>>> > "C" linkage >>>> > 1 error detected in the compilation of >>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu". >>>> > Possible ERROR while running compiler: exit code 1 >>>> > stderr: >>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more than >>>> one >>>> > instance of overloaded function "__nv_associate_access_property_impl" >>>> has >>>> > "C" linkage >>>> > >>>> > 1 error detected in the compilation of >>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>> > >>>> > This looks like screwed up headers to me, but I will let someone that >>>> > understands CUDA compilation reply. >>>> > >>>> > Thanks, >>>> > >>>> > Matt >>>> > >>>> > Thanks, >>>> > > Sreeram >>>> > > >>>> > >>>> > >>>> > >>> >>>