https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120216

--- Comment #1 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Hi there, I now bought an RTX 5060 TI from Nvidia. Nvaccelinfo yields:

Unified Addressing:            Yes
Managed Memory:                Yes
Concurrent Managed Memory:     Yes
Preemption Supported:          Yes
Cooperative Launch:            Yes
Cluster Launch:                Yes
Unified Function Pointers:     Yes
Unified Memory:                HMM
Memory Models Flags:           -gpu=mem:separate, -gpu=mem:managed,
-gpu=mem:unified

So, nvidia clearly says this card would support unified memory.

Yet still, the output of the following snipped:

#include <cuda_runtime.h>
#include <iostream>

int main() {
    int value;
    cudaDeviceGetAttribute(&value, cudaDevAttrPageableMemoryAccess, 1);
    std::cout << "Pageable Memory Access supported: " << value << std::endl;
    return 0;
}

returns zero...

This would basically mean that unified_shared_memory on gcc with nvptx has more
requirements than what nvidia calls Heterogeneous memory management.

This is what my card, according to nvaccelinfo supports... and this is how that
is described:
https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/
Heterogeneous Memory Management (HMM) is a CUDA memory management feature that
extends the simplicity and productivity of the CUDA Unified Memory programming
model to include system allocated memory on systems with PCIe-connected NVIDIA
GPUs. System allocated memory refers to memory that is ultimately allocated by
the operating system; for example, through malloc, mmap, the C++ new operator
(which of course uses the preceding mechanisms), or related system routines
that set up CPU-accessible memory for the application. 

Previously, on PCIe-based machines, system allocated memory was not directly
accessible by the GPU. The GPU could only access memory that came from special
allocators such as cudaMalloc or cudaMallocManaged. 

With HMM enabled, all application threads (GPU or CPU) can directly access all
of the application’s system allocated memory. As with Unified Memory (which can
be thought of as a subset of, or precursor to HMM), there is no need to
manually copy system allocated memory between processors. This is because it is
automatically placed on the CPU or GPU, based on processor usage.



According to the documentation of gcc nvptx,

https://gcc.gnu.org/onlinedocs/gcc-15.1.0/libgomp/nvptx.html#FOOT5

this support would not be sufficient for the openmp directive  #omp requires
unified shared memory.

This means that this feature would then only be available for onboard gpu's due
to gcc restrictions, despite the nvidia driver created a unified memory space
for gpu and cpu.

Reply via email to