On 13/02/2023 14:38, Thomas Schwinge wrote:
Hi!

On 2022-03-08T11:30:55+0000, Hafiz Abid Qadeer <ab...@codesourcery.com> wrote:
From: Andrew Stubbs <a...@codesourcery.com>

Add a new option.  It will be used in follow-up patches.

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi

+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).

So, this is currently implemented via 'mlockall', which, as discussed,
(a) has issues ('ulimit -l'), and (b) doesn't actually achieve what it
meant to achieve (because it doesn't register the page-locked memory with
the GPU driver).

So one idea was to re-purpose the unified shared memory
'gcc/omp-low.cc:pass_usm_transform' (compiler pass that "changes calls to
malloc/free/calloc/realloc and operator new to memory allocation
functions in libgomp with allocator=ompx_unified_shared_mem_alloc"),
<https://inbox.sourceware.org/gcc-patches/20220308113059.688551-5-ab...@codesourcery.com>>
 (I have not yet looked into that in detail.)

Here's now a different idea.  As '-foffload-memory=pinned', per the name
of the option, concerns itself with memory used in offloading but not
host execution generally, why are we actually attempting to "[force] all
host memory to be pinned" -- why not just the memory that's being used
with offloading?  That is, if '-foffload-memory=pinned' is set, register
as page-locked with the GPU driver all memory that appears in OMP
offloading data regions, such as OpenMP 'target' 'map' clauses etc.  That
way, this is directed at the offloading data transfers, as itended, but
at the same time we don't "waste" page-locked memory for generic host
memory allocations.  What do you think -- you, who've spent a lot more
time on this topic than I have, so it's likely possible that I fail to
realize some "details"?

The main reason it is the way it is is because in general it's not possible to know what memory is going to be offloaded at the time it is allocated (and stack/static memory is never allocated that way).

If there's a way to pin it after the fact then maybe that's not a terrible idea? The downside is that the memory might already have been paged out at that point, and we'd have to track what we'd previously pinned, or else re-pin it every time we launch a kernel. We'd also have no way to unpin previously pinned memory (not that that's relevant to the "lock all" case).

My original plan was to use omp_alloc for both the standard OpenMP support and the -foffload-memory option (to get the benefit of pinning without modifying any source), but then I decided that the mlockall option was much less invasive. This is still the best way to implement target-independent pinning, when there's no driver registration option.

Andrew

Reply via email to