Re: [petsc-users] issue of MatCreateDense in the CUDA codes

刘浪天 via petsc-users Thu, 03 Oct 2024 08:49:50 -0700

"How did you compile your code, using nvcc, mpicc or mpicxx? I expect PetscScalar to be std::complex with C++ or _Complex with C. So I don't understand why "PetscScalar still refers to thrust::complex"."

I use mpicxx and nvcc.

I also attached my cmakelists.txt

Best,

--------------------

Langtian Liu

On Oct 3, 2024, at 5:30 PM, 刘浪天 via petsc-users <petsc-users@mcs.anl.gov> wrote:
Hi Junchao,

I pulled and reinstalled. Although the PetscScalar still refers to thrust::complex, based on your suggestion, cast the pointer to PetscScalar * can work. It has the same storage order as std::complex.
Mat kernel;
auto * h_kernel = reinterpret_cast<PetscScalar *>(h_kernel_temp);
PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, nRows, nCols, h_kernel, &kernel));
PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY));
PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY));
Thank you.

Best wishes,
--------------------
Langtian Liu
On Oct 3, 2024, at 4:38 PM, Junchao Zhang <junchao.zh...@gmail.com> wrote:

The MR is merged to petsc/release.

BTW, in MatCreateDense, the data pointer has to be a host pointer. It is better to always use PetscScalar* (instead of std::complex<double>*) to do the cast.

--Junchao Zhang

On Thu, Oct 3, 2024 at 3:03 AM 刘浪天 <langtian....@icloud.com> wrote:
Okay. I see :D
-------------------- Langtian Liu Institute for Theorectical Physics, Justus-Liebig-University Giessen Heinrich-Buff-Ring 16, 35392 Giessen Germany email: langtian....@icloud.com Tel: (+49)641 99 33342

On Oct 3, 2024, at 9:58 AM, Jose E. Roman <jro...@dsic.upv.es> wrote:

You have to wait until the merge request has the label "Merged" instead of "Open".

El 3 oct 2024, a las 9:55, 刘浪天 via petsc-users <petsc-users@mcs.anl.gov> escribió:

Hello Junchao,

Okay. Thank you for helping find this bug. I pull the newest version of petsc today. It seems this error has not been fixed in the present release version. Maybe I should wait for some days.

Best wishes,
Langtian

On Oct 3, 2024, at 12:12 AM, Junchao Zhang <junchao.zh...@gmail.com> wrote:

Hi, Langtian,
Thanks for the configure.log and I now see what's wrong. Since you compiled your code with nvcc, we mistakenly thought petsc was configured with cuda.
It is fixed in https://gitlab.com/petsc/petsc/-/merge_requests/7909, which will be in petsc/release and main.

Thanks.
--Junchao Zhang

On Wed, Oct 2, 2024 at 3:11 PM 刘浪天 <langtian....@icloud.com> wrote:
Hi Junchao,

I check it, I haven't use cuda when install pure cpu version of petsc.
The configure.log has been attached. Thank you for your reply.

Best wishes,
-------------------- Langtian Liu Institute for Theorectical Physics, Justus-Liebig-University Giessen Heinrich-Buff-Ring 16, 35392 Giessen Germany email: langtian....@icloud.com Tel: (+49)641 99 33342

On Oct 2, 2024, at 5:05 PM, Junchao Zhang <junchao.zh...@gmail.com> wrote:

On Wed, Oct 2, 2024 at 3:57 AM 刘浪天 via petsc-users <petsc-users@mcs.anl.gov> wrote:
Hi all,

I am using the PETSc and SLEPc to solve the Faddeev equation of baryons. I encounter a problem of function MatCreateDense when changing from CPU to CPU-GPU computations.
At first, I write the codes in purely CPU computation in the following way and it works.
```
Eigen::MatrixXcd H_KER;
Eigen::MatrixXcd G0;
printf("\nCompute the propagator matrix.\n");
prop_matrix_nucleon_sc_av(Mn, pp_nodes, cos1_nodes);
printf("\nCompute the propagator matrix done.\n");
printf("\nCompute the kernel matrix.\n");
bse_kernel_nucleon_sc_av(Mn, pp_nodes, pp_weights, cos1_nodes, cos1_weights);
printf("\nCompute the kernel matrix done.\n");
printf("\nCompute the full kernel matrix by multiplying kernel and propagator matrix.\n");
MatrixXcd kernel_temp = H_KER * G0;
printf("\nCompute the full kernel matrix done.\n");

// Solve the eigen system with SLEPc
printf("\nSolve the eigen system in the rest frame.\n");
// Get the size of the Eigen matrix
int nRows = (int) kernel_temp.rows();
int nCols = (int) kernel_temp.cols();
// Create PETSc matrix and share the data of kernel_temp
Mat kernel;
PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, nRows, nCols, kernel_temp.data(), &kernel));
PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY));
PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY));
```
Now I change to compute the propagator and kernel matrices in GPU and then compute the largest eigenvalues in CPU using SLEPc in the ways below.
```
cuDoubleComplex *h_propmat;
cuDoubleComplex *h_kernelmat;
int dim = EIGHT * NP * NZ;
printf("\nCompute the propagator matrix.\n");
prop_matrix_nucleon_sc_av_cuda(Mn, pp_nodes.data(), cos1_nodes.data());
printf("\nCompute the propagator matrix done.\n");
printf("\nCompute the kernel matrix.\n");
kernel_matrix_nucleon_sc_av_cuda(Mn, pp_nodes.data(), pp_weights.data(), cos1_nodes.data(), cos1_weights.data());
printf("\nCompute the kernel matrix done.\n");
printf("\nCompute the full kernel matrix by multiplying kernel and propagator matrix.\n");
// Map the raw arrays to Eigen matrices (column-major order)
auto *h_kernel_temp = new cuDoubleComplex [dim*dim];
matmul_cublas_cuDoubleComplex(h_kernelmat,h_propmat,h_kernel_temp,dim,dim,dim);
printf("\nCompute the full kernel matrix done.\n");

// Solve the eigen system with SLEPc
printf("\nSolve the eigen system in the rest frame.\n");
int nRows = dim;
int nCols = dim;
// Create PETSc matrix and share the data of kernel_temp
Mat kernel;
auto* h_kernel = (std::complex<double>*)(h_kernel_temp);
PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, nRows, nCols, h_kernel_temp, &kernel));
PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY));
PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY));
But in this case, the compiler told me that the MatCreateDense function uses the data pointer as type of "thrust::complex<double>" instead of "std::complex<double>".
I am sure I only configured and install PETSc in purely CPU without GPU and this codes are written in the host function.
Please double check that your PETSc was purely CPU configured. You can find it at the end of your configure.log to see if petsc is configured with CUDA.
Since thrust::complex<double> is a result of a petsc/cuda configuration, I have this doubt.

Why the function changes its behavior? Did you also meet this problem when writing the cuda codes and how to solve this problem.
I tried to copy the data to a new thrust::complex<double> matrix but this is very time consuming since my matrix is very big. Is there a way to create the Mat from the original data without changing the data type to thrust::complex<double> in the cuda applications? Any response will be appreciated. Thank you!

Best wishes,
Langtian Liu

------
Institute for Theorectical Physics, Justus-Liebig-University Giessen
Heinrich-Buff-Ring 16, 35392 Giessen Germany

cmake_minimum_required(VERSION 3.23)

# Set CUDA architecture
set(CMAKE_CUDA_ARCHITECTURES "75")

# Set the CUDA compiler manually
set(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc)

project(N_Baryon LANGUAGES C CXX CUDA)

# Set C++ standard
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_COMPILER "mpicxx")

# Set CUDA standard
set(CMAKE_CUDA_STANDARD 20)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fopenmp")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mtune=native")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -flto=auto")

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler=-fopenmp")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler=-mtune=native")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler=-march=native")


# Make our cmake modules visible to CMake
list (APPEND CMAKE_MODULE_PATH "/home/langtian/CLionProjects/CMake_Modules")

# Set environment variables
set(PETSC_DIR "/home/langtian/PETSc")
set(SLEPC_DIR "/home/langtian/SLEPc")

# Check for PETSc
find_package(PETSc)

# Setting for FeatureSummary
if(PETSc_FOUND)
    message(STATUS "Found PETSc version ${PETSc_VERSION}")
    set_property(GLOBAL APPEND PROPERTY PACKAGES_FOUND PETSc)
else()
    set_property(GLOBAL APPEND PROPERTY PACKAGES_NOT_FOUND PETSc)
endif()

#check for SLEPc
find_package(SLEPc)

# Setting for FeatureSummary
if (SLEPc_FOUND)
    message(STATUS "Found SLEPc version ${SLEPc_VERSION}")
    set_property(GLOBAL APPEND PROPERTY PACKAGES_FOUND SLEPc)
else ()
    set_property(GLOBAL APPEND PROPERTY PACKAGE_NOT_FOUND SLEPc)
endif ()

find_package(OpenMP REQUIRED)
find_package(OpenBLAS REQUIRED)
include_directories(${OPENBLAS_INCLUDES})
include_directories(../eigen-3.4.0)
include_directories(../spectra-1.0.0/include)
include_directories(../Math_Tools)
include_directories(../File2Data)
include_directories(../Quark)
include_directories(../Scalar_Diquark)
include_directories(../AxialVector_Diquark)
include_directories(../Pseudoscalar_Diquark)
include_directories(../Vector_Diquark)
include_directories(../Form_Factors_from_JEM)


add_executable(N_1_2_Positive
        n_1_2_positive_q_dq_faddeev.cu
)
target_link_libraries(N_1_2_Positive
        PETSc
        SLEPc
        OpenMP::OpenMP_CXX
        cublas
        ${OpenBLAS_LIBRARIES}
)
set_target_properties(N_1_2_Positive PROPERTIES
        CUDA_SEPARABLE_COMPILATION ON)

Re: [petsc-users] issue of MatCreateDense in the CUDA codes

Reply via email to