https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120216

            Bug ID: 120216
           Summary: openmp unified shared memory currently requires
                    pageableMemoryAccess perhaps managedMemory would
                    suffice
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: schulz.benjamin at googlemail dot com
  Target Milestone: ---

Hi there, as per the gcc 15.1 documentation:
https://gcc.gnu.org/onlinedocs/gcc-15.1.0/libgomp/nvptx.html

OpenMP code that has a requires directive with self_maps or
unified_shared_memory runs on nvptx devices if and only if all of those support
the pageableMemoryAccess property;5 otherwise, all nvptx device are removed
from the list of available devices (“host fallback”).

However, there are devices, like the Nvidia gtx 1660 super, which has cuda
capability 7.5 and the cuda flags concurrentManagedAccess and managedMemory,
but no pageableMemoryAccess.

In that case, the Nvidia documentation says:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-cc60


For devices with compute capability 6.x or higher but without pageable memory
access, CUDA Managed Memory is fully supported and coherent.


The programming model and performance tuning of unified memory is largely
similar to the model as described in Unified memory on devices with full CUDA
Unified Memory support, with the notable exception that system allocators
cannot be used to allocate memory. Thus, the following list of sub-sections do
not apply:
System-Allocated Memory: in-depth examples
Hardware/Software Coherency


So, is pageable memory access really needed for the openmp unified shared
memory directive? unified_shared_memory?


Because if I understand the nividia documentation correctly, if managed memory
and concurrent managed access are there, then the compiler could just look
whether a pointer is needed on the device, and then replace malloc by 
cudamallocmanaged and then it would have the shared pointer?

Reply via email to