Re: [og12] Attempt to register OpenMP pinned memory using a device instead of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)

Andrew Stubbs Mon, 20 Feb 2023 01:49:13 -0800

On 17/02/2023 08:12, Thomas Schwinge wrote:

Hi Andrew!


On 2023-02-16T23:06:44+0100, I wrote:

On 2023-02-16T16:17:32+0000, "Stubbs, Andrew via Gcc-patches" 
<gcc-patches@gcc.gnu.org> wrote:

The mmap implementation was not optimized for a lot of small allocations, and I 
can't see that issue changing here


That's correct, 'mmap' remains.  Under the hood, 'cuMemHostRegister' must
surely also be doing some 'mlock'-like thing, so I figured it's best to
feed page-boundary memory regions to it, which 'mmap' gets us.

so I don't know if this can be used for mlockall replacement.

I had assumed that using the Cuda allocator would fix that limitation.


 From what I've read (but no first-hand experiments), there's non-trivial
overhead with 'cuMemHostRegister' (just like with 'mlock'), so routing
all small allocations individually through it probably isn't a good idea
either.  Therefore, I suppose, we'll indeed want to use some local
allocator if we wish this "optimized for a lot of small allocations".


Eh, I suppose your point indirectly was that instead of 'mmap' plus
'cuMemHostRegister' we ought to use 'cuMemAllocHost'/'cuMemHostAlloc', as
we assume those already do implement such a local allocator.  Let me
quickly change that indeed -- we don't currently have a need to use
'cuMemHostRegister' instead of 'cuMemAllocHost'/'cuMemHostAlloc'.

Yes, that's right. I suppose it makes sense to register memory wealready have, but if we want new memory then trying to reinvent whathappens inside cuMemAllocHost is pointless.


Andrew

Re: [og12] Attempt to register OpenMP pinned memory using a device instead of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)

Reply via email to