I am a new user of PETSc and want to know more about the underlying implementation for matrix-vector multiplication (Ax=y).
PETSc utilizes a 1D distribution and communicates only parts of the vector x utilized depending on the sparsity pattern of A. Is the communication of x done with MPI-3 RMA and utilizes cuda-aware mpi for RMA? Best regards, Alexander Maeder