https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120216
--- Comment #1 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- Hi there, I now bought an RTX 5060 TI from Nvidia. Nvaccelinfo yields: Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Cluster Launch: Yes Unified Function Pointers: Yes Unified Memory: HMM Memory Models Flags: -gpu=mem:separate, -gpu=mem:managed, -gpu=mem:unified So, nvidia clearly says this card would support unified memory. Yet still, the output of the following snipped: #include <cuda_runtime.h> #include <iostream> int main() { int value; cudaDeviceGetAttribute(&value, cudaDevAttrPageableMemoryAccess, 1); std::cout << "Pageable Memory Access supported: " << value << std::endl; return 0; } returns zero... This would basically mean that unified_shared_memory on gcc with nvptx has more requirements than what nvidia calls Heterogeneous memory management. This is what my card, according to nvaccelinfo supports... and this is how that is described: https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/ Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming model to include system allocated memory on systems with PCIe-connected NVIDIA GPUs. System allocated memory refers to memory that is ultimately allocated by the operating system; for example, through malloc, mmap, the C++ new operator (which of course uses the preceding mechanisms), or related system routines that set up CPU-accessible memory for the application. Previously, on PCIe-based machines, system allocated memory was not directly accessible by the GPU. The GPU could only access memory that came from special allocators such as cudaMalloc or cudaMallocManaged. With HMM enabled, all application threads (GPU or CPU) can directly access all of the application’s system allocated memory. As with Unified Memory (which can be thought of as a subset of, or precursor to HMM), there is no need to manually copy system allocated memory between processors. This is because it is automatically placed on the CPU or GPU, based on processor usage. According to the documentation of gcc nvptx, https://gcc.gnu.org/onlinedocs/gcc-15.1.0/libgomp/nvptx.html#FOOT5 this support would not be sufficient for the openmp directive #omp requires unified shared memory. This means that this feature would then only be available for onboard gpu's due to gcc restrictions, despite the nvidia driver created a unified memory space for gpu and cpu.