On Wed, Oct 2, 2024 at 6:11 AM 刘浪天 via petsc-users <petsc-users@mcs.anl.gov> wrote:
> I cannot declare everything as PetscScalar, my strategy is computing the > elements of matrix on GPU blocks by blocks and copying them back to the > CPU. Finally computing the eigenvalues using SLEPc on CPU. > Then you have to either a) have a temporary array where you copy the GPU results to the CPU type, or b) if you know the types are the same size, you can cast the pointer. Thanks, Matt > -------------------- Langtian Liu Institute for Theorectical Physics, > Justus-Liebig-University Giessen Heinrich-Buff-Ring 16, 35392 Giessen > Germany email: langtian....@icloud.com Tel: (+49)641 99 33342 > > On Oct 2, 2024, at 11:31 AM, Jose E. Roman <jro...@dsic.upv.es> wrote: > > > Does it work if you declare everything as PetscScalar instead of > cuDoubleComplex? > > El 2 oct 2024, a las 11:23, 刘浪天 <langtian....@icloud.com> escribió: > > Hi Jose, > > Since my matrix is two large, I cannot create the Mat on GPU. So I still > want to create and compute the eigenvalues of this matrix on CPU using > SLEPc. > > Best, > -------------------- Langtian Liu Institute for Theorectical Physics, > Justus-Liebig-University Giessen Heinrich-Buff-Ring 16, 35392 Giessen > Germany email: langtian....@icloud.com Tel: (+49)641 99 33342 > > On Oct 2, 2024, at 11:18 AM, Jose E. Roman <jro...@dsic.upv.es> wrote: > > > For the CUDA case you should use MatCreateDenseCUDA() instead of > MatCreateDense(). With this you pass a pointer with the data on the GPU > memory. But I guess "new cuDoubleComplex[dim*dim]" is allocating on the > CPU, you should use cudaMalloc() instead. > > Jose > > > El 2 oct 2024, a las 10:56, 刘浪天 via petsc-users <petsc-users@mcs.anl.gov> > escribió: > > Hi all, > > I am using the PETSc and SLEPc to solve the Faddeev equation of baryons. I > encounter a problem of function MatCreateDense when changing from CPU to > CPU-GPU computations. > At first, I write the codes in purely CPU computation in the following way > and it works. > ``` > Eigen::MatrixXcd H_KER; > Eigen::MatrixXcd G0; > printf("\nCompute the propagator matrix.\n"); > prop_matrix_nucleon_sc_av(Mn, pp_nodes, cos1_nodes); > printf("\nCompute the propagator matrix done.\n"); > printf("\nCompute the kernel matrix.\n"); > bse_kernel_nucleon_sc_av(Mn, pp_nodes, pp_weights, cos1_nodes, > cos1_weights); > printf("\nCompute the kernel matrix done.\n"); > printf("\nCompute the full kernel matrix by multiplying kernel and > propagator matrix.\n"); > MatrixXcd kernel_temp = H_KER * G0; > printf("\nCompute the full kernel matrix done.\n"); > > // Solve the eigen system with SLEPc > printf("\nSolve the eigen system in the rest frame.\n"); > // Get the size of the Eigen matrix > int nRows = (int) kernel_temp.rows(); > int nCols = (int) kernel_temp.cols(); > // Create PETSc matrix and share the data of kernel_temp > Mat kernel; > PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, > nRows, nCols, kernel_temp.data(), &kernel)); > PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY)); > PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY)); > ``` > Now I change to compute the propagator and kernel matrices in GPU and then > compute the largest eigenvalues in CPU using SLEPc in the ways below. > ``` > cuDoubleComplex *h_propmat; > cuDoubleComplex *h_kernelmat; > int dim = EIGHT * NP * NZ; > printf("\nCompute the propagator matrix.\n"); > prop_matrix_nucleon_sc_av_cuda(Mn, pp_nodes.data(), cos1_nodes.data()); > printf("\nCompute the propagator matrix done.\n"); > printf("\nCompute the kernel matrix.\n"); > kernel_matrix_nucleon_sc_av_cuda(Mn, pp_nodes.data(), pp_weights.data(), > cos1_nodes.data(), cos1_weights.data()); > printf("\nCompute the kernel matrix done.\n"); > printf("\nCompute the full kernel matrix by multiplying kernel and > propagator matrix.\n"); > // Map the raw arrays to Eigen matrices (column-major order) > auto *h_kernel_temp = new cuDoubleComplex [dim*dim]; > > matmul_cublas_cuDoubleComplex(h_kernelmat,h_propmat,h_kernel_temp,dim,dim,dim); > printf("\nCompute the full kernel matrix done.\n"); > > // Solve the eigen system with SLEPc > printf("\nSolve the eigen system in the rest frame.\n"); > int nRows = dim; > int nCols = dim; > // Create PETSc matrix and share the data of kernel_temp > Mat kernel; > auto* h_kernel = (std::complex<double>*)(h_kernel_temp); > PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, > nRows, nCols, h_kernel_temp, &kernel)); > PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY)); > PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY)); > But in this case, the compiler told me that the MatCreateDense function > uses the data pointer as type of "thrust::complex<double>" instead of > "std::complex<double>". > I am sure I only configured and install PETSc in purely CPU without GPU > and this codes are written in the host function. > Why the function changes its behavior? Did you also meet this problem when > writing the cuda codes and how to solve this problem. > I tried to copy the data to a new thrust::complex<double> matrix but this > is very time consuming since my matrix is very big. Is there a way to > create the Mat from the original data without changing the data type to > thrust::complex<double> in the cuda applications? Any response will be > appreciated. Thank you! > > Best wishes, > Langtian Liu > > ------ > Institute for Theorectical Physics, Justus-Liebig-University Giessen > Heinrich-Buff-Ring 16, 35392 Giessen Germany > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fYVABMc9HqFTe-LOQjYCjmdl2bvWNOESlGNrBaRjwG8Kk2w8kDck2d-5ka6MqEO-gH_ppuZo46QFEgIgkLH6$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fYVABMc9HqFTe-LOQjYCjmdl2bvWNOESlGNrBaRjwG8Kk2w8kDck2d-5ka6MqEO-gH_ppuZo46QFEoZYvF7A$ >