On 17/02/2023 08:12, Thomas Schwinge wrote:
Hi Andrew!

On 2023-02-16T23:06:44+0100, I wrote:
On 2023-02-16T16:17:32+0000, "Stubbs, Andrew via Gcc-patches" 
<gcc-patches@gcc.gnu.org> wrote:
The mmap implementation was not optimized for a lot of small allocations, and I 
can't see that issue changing here

That's correct, 'mmap' remains.  Under the hood, 'cuMemHostRegister' must
surely also be doing some 'mlock'-like thing, so I figured it's best to
feed page-boundary memory regions to it, which 'mmap' gets us.

so I don't know if this can be used for mlockall replacement.

I had assumed that using the Cuda allocator would fix that limitation.

 From what I've read (but no first-hand experiments), there's non-trivial
overhead with 'cuMemHostRegister' (just like with 'mlock'), so routing
all small allocations individually through it probably isn't a good idea
either.  Therefore, I suppose, we'll indeed want to use some local
allocator if we wish this "optimized for a lot of small allocations".

Eh, I suppose your point indirectly was that instead of 'mmap' plus
'cuMemHostRegister' we ought to use 'cuMemAllocHost'/'cuMemHostAlloc', as
we assume those already do implement such a local allocator.  Let me
quickly change that indeed -- we don't currently have a need to use
'cuMemHostRegister' instead of 'cuMemAllocHost'/'cuMemHostAlloc'.


Yes, that's right. I suppose it makes sense to register memory we already have, but if we want new memory then trying to reinvent what happens inside cuMemAllocHost is pointless.

Andrew

Reply via email to