Re: libdrm_amdgpu being forked and merged into Mesa

2024-10-24 Thread Christian König

Yeah that would really not work.

We at least need the code to unify the render node file descriptor stay 
inside libdrm and be used by Mesa.


Otherwise Mesa would start using a separate render node file descriptor.

Regards,
Christian.

Am 24.10.24 um 16:19 schrieb Felix Kuehling:
I'm not sure what this means. ROCm allocates all its virtual address 
space with mmap. That includes address space for BOs imported with the 
interop APIs.


We are sharing the GPU virtual address space with Mesa since we're 
using the same render nodes. So if Mesa's GPU virtual address space 
management conflicts with the CPU virtual address space that ROCm 
shares between CPU and GPU, that would be a problem.


Regards,
  Felix


On 2024-10-24 09:24, Alex Deucher wrote:

On Thu, Oct 24, 2024 at 8:38 AM Christian König
 wrote:
Completely agree, but that's a platform decision which Alex needs to 
make.

+ Felix

Does buffer sharing with ROCm depend on the shared VA space?

Alex


Christian.

Am 24.10.24 um 14:16 schrieb Marek Olšák:

I don't think we need to share VA space. APIs usually share one or 
two buffers. That's almost nothing compared to the size of the 
occupied VA space. They also likely map them again for themselves - 
APIs don't share any virtual addresses as far as I know.


Marek

On Thu, Oct 24, 2024, 08:12 Christian König 
 wrote:

Am 22.10.24 um 06:06 schrieb Marek Olšák:

Hi,

The MR is up:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31756

It's good to go as long as there is no functional issue.
Finally getting rid of all the mid-layering? What about shared 
VA-space?


Regards,
Christian.

libdrm_amdgpu will still be relevant for PAL, ROCm, and 
xf86-video-amdgpu.


Marek




Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications

2024-10-24 Thread Derek Lesho
Oh interesting, thanks for looking into this guys. As far as I 
understand it though, this is still not duplicating the mapping, but 
setting up a fault handler at the original address to manage access. I 
don't think we'd want this since when wine remaps the page/s hosting a 
given buffer it also forces all other resources mapped to the same 
page/s to go through this presumably slow fault handler. Am I missing 
something?


Am 10/24/24 um 18:40 schrieb tbl...@icloud.com:
Wait, apparently this was fully merged in kernel 5.13? The man page is 
simply out of date. 
https://github.com/torvalds/linux/commit/a4609387859f0281951f5e476d9f76d7fb9ab321


~Theodore


On Oct 24, 2024, at 9:37 AM, tbl...@icloud.com wrote:



On Oct 24, 2024, at 1:04 AM, Derek Lesho  wrote:

In my last mail I responded to this approach all the way at the 
bottom, so it probably got lost: mremap on Linux as it exists now 
won't work as it only supports private anonymous mappings (in 
conjunction with MREMAP_DONTUNMAP), which GPU mappings are not.


This is seemingly not insurmountable: 
https://lore.kernel.org/linux-mm/20210303175235.3308220-1-bgef...@google.com/ 



~Theodore



Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications

2024-10-24 Thread Christian König

I haven't tested it but as far as I know that isn't correct.

As far as I know you can map the same VMA at a different location even 
without MREMAP_DONTUNMAP. And yes MREMAP_DONTUNMAP only work with 
private mappings, but that isn't needed here.


Give me a moment to test this.

Regards,
Christian.

Am 24.10.24 um 10:03 schrieb Derek Lesho:
In my last mail I responded to this approach all the way at the 
bottom, so it probably got lost: mremap on Linux as it exists now 
won't work as it only supports private anonymous mappings (in 
conjunction with MREMAP_DONTUNMAP), which GPU mappings are not.


Am 10/24/24 um 01:06 schrieb James Jones:

That makes sense. Reading the man page myself, it does seem like:

-If the drivers can guarantee they set MAP_SHARED when creating their 
initial mapping.


-If WINE is fine rounding down to page boundaries to deal with 
mappings of suballocations and either using some lookup structure to 
avoid duplicate remappings (probably needed to handle unmap anyway 
per below) or just living with the perf cost and address space 
overconsumption for duplicate remappings.


-If mremap() preserves the cache attributes of the original mapping.

Then no GL API change would be needed. WINE would just have to do an 
if (addrAbove4G) { mremapStuff() } on map and presumably add some 
tracking to perform an equivalent munmap() when unmapping. I assume 
WINE already has a bunch of vaddr tracking logic in use to manage the 
<4G address space as described elsewhere in the thread. That would be 
pretty ideal from a driver vendor perspective.


Does that work?

Thanks,
-James

On 10/23/24 06:12, Christian König wrote:
I haven't read through the whole mail thread, but if you manage the 
address space using mmap() then you always run into this issue.


If you manage the whole 4GiB address space by Wine then you never 
run into this issue. You would just allocate some address range 
internally and mremap() into that.


Regards,
Christian.

Am 22.10.24 um 19:32 schrieb James Jones:
This sounds interesting, but does it come with the same "Only gets 
2GB VA" downside Derek pointed out in the thread fork where he was 
responding to Michel?


Thanks,
-James

On 10/22/24 07:14, Christian König wrote:

Hi guys,

one theoretical alternative not mentioned in this thread is the 
use of mremap().


In other words you reserve some address space below 2G by using 
mmap(NULL, length, PROT_NONE, MAP_32BIT | MAP_ANONYMOUS, 0, 0) and 
then use mremap(addr64bit, 0, length, MREMAP_FIXED, reserved_addr).


I haven't tested this but at least in theory it should give you a 
duplicate of the 64bit mapping in the lower 2G of the address space.


Important is that you give 0 as oldsize to mremap() so that the 
old mapping isn't unmapped but rather just a new mapping of the 
existing VMA created.


Regards,
Christian.


Am 18.10.24 um 23:55 schrieb Derek Lesho:

Hey everyone 👋,

I'm Derek from the Wine project, and wanted to start a discussion 
with y'all about potentially extending the Mesa OGL drivers to 
help us with a functionality gap we're facing.


Problem Space:

In the last few years Wine's support for running 32-bit windows 
apps in a 64-bit host environment (wow64) has almost reached 
feature completion, but there remains a pain point with OpenGL 
applications: Namely that Wine can't return a 64-bit GL 
implementation's buffer mappings to a 32 bit application when the 
address is outside of the 32-bit range.


Currently, we have a workaround that will copy any changes to the 
mapping back to the host upon glBufferUnmap, but this of course 
is slow when the implementation directly returns mapped memory, 
and doesn't work for GL_PERSISTENT_BIT, where directly mapped 
memory is required.


A few years ago we also faced this problem with Vulkan's, which 
was solved through the VK_EXT_map_memory_placed extension Faith 
drafted, allowing us to use our Wine-internal allocator to 
provide the pages the driver maps to. I'm now wondering if an GL 
equivalent would also be seen as feasible amongst the devs here.


Proposed solution:

As the GL backend handles host mapping in its own code, only 
giving suballocations from its mappings back to the App, the 
problem is a little bit less straight forward in comparison to 
our Vulkan solution: If we just allowed the application to set 
its own placed mapping when calling glMapBuffer, the driver might 
then have to handle moving buffers out of already mapped ranges, 
and would lose control over its own memory management schemes.


Therefore, I propose a GL extension that allows the GL client to 
provide a mapping and unmapping callback to the implementation, 
to be used whenever the driver needs to perform such operations. 
This way the driver remains in full control of its memory 
management affairs, and the amount of work for an implementation 
as well as potential for bugs is kept minimal. I've written a 
draft implementation in Zink using map_memory_placed [1] and a 
corresp

Re: libdrm_amdgpu being forked and merged into Mesa

2024-10-24 Thread Marek Olšák
I don't think we need to share VA space. APIs usually share one or two
buffers. That's almost nothing compared to the size of the occupied VA
space. They also likely map them again for themselves - APIs don't share
any virtual addresses as far as I know.

Marek

On Thu, Oct 24, 2024, 08:12 Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 22.10.24 um 06:06 schrieb Marek Olšák:
> > Hi,
> >
> > The MR is up:
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31756
> >
> > It's good to go as long as there is no functional issue.
>
> Finally getting rid of all the mid-layering? What about shared VA-space?
>
> Regards,
> Christian.
>
> >
> > libdrm_amdgpu will still be relevant for PAL, ROCm, and
> xf86-video-amdgpu.
> >
> > Marek
>
>


Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications

2024-10-24 Thread Marek Olšák
Is there a way for drivers to change the semantics of memory mappings to
make mremap work?

Marek

On Thu, Oct 24, 2024, 07:08 Derek Lesho  wrote:

> In my last mail I responded to this approach all the way at the bottom,
> so it probably got lost: mremap on Linux as it exists now won't work as
> it only supports private anonymous mappings (in conjunction with
> MREMAP_DONTUNMAP), which GPU mappings are not.
>
> Am 10/24/24 um 01:06 schrieb James Jones:
> > That makes sense. Reading the man page myself, it does seem like:
> >
> > -If the drivers can guarantee they set MAP_SHARED when creating their
> > initial mapping.
> >
> > -If WINE is fine rounding down to page boundaries to deal with
> > mappings of suballocations and either using some lookup structure to
> > avoid duplicate remappings (probably needed to handle unmap anyway per
> > below) or just living with the perf cost and address space
> > overconsumption for duplicate remappings.
> >
> > -If mremap() preserves the cache attributes of the original mapping.
> >
> > Then no GL API change would be needed. WINE would just have to do an
> > if (addrAbove4G) { mremapStuff() } on map and presumably add some
> > tracking to perform an equivalent munmap() when unmapping. I assume
> > WINE already has a bunch of vaddr tracking logic in use to manage the
> > <4G address space as described elsewhere in the thread. That would be
> > pretty ideal from a driver vendor perspective.
> >
> > Does that work?
> >
> > Thanks,
> > -James
> >
> > On 10/23/24 06:12, Christian König wrote:
> >> I haven't read through the whole mail thread, but if you manage the
> >> address space using mmap() then you always run into this issue.
> >>
> >> If you manage the whole 4GiB address space by Wine then you never run
> >> into this issue. You would just allocate some address range
> >> internally and mremap() into that.
> >>
> >> Regards,
> >> Christian.
> >>
> >> Am 22.10.24 um 19:32 schrieb James Jones:
> >>> This sounds interesting, but does it come with the same "Only gets
> >>> 2GB VA" downside Derek pointed out in the thread fork where he was
> >>> responding to Michel?
> >>>
> >>> Thanks,
> >>> -James
> >>>
> >>> On 10/22/24 07:14, Christian König wrote:
>  Hi guys,
> 
>  one theoretical alternative not mentioned in this thread is the use
>  of mremap().
> 
>  In other words you reserve some address space below 2G by using
>  mmap(NULL, length, PROT_NONE, MAP_32BIT | MAP_ANONYMOUS, 0, 0) and
>  then use mremap(addr64bit, 0, length, MREMAP_FIXED, reserved_addr).
> 
>  I haven't tested this but at least in theory it should give you a
>  duplicate of the 64bit mapping in the lower 2G of the address space.
> 
>  Important is that you give 0 as oldsize to mremap() so that the old
>  mapping isn't unmapped but rather just a new mapping of the
>  existing VMA created.
> 
>  Regards,
>  Christian.
> 
> 
>  Am 18.10.24 um 23:55 schrieb Derek Lesho:
> > Hey everyone 👋,
> >
> > I'm Derek from the Wine project, and wanted to start a discussion
> > with y'all about potentially extending the Mesa OGL drivers to
> > help us with a functionality gap we're facing.
> >
> > Problem Space:
> >
> > In the last few years Wine's support for running 32-bit windows
> > apps in a 64-bit host environment (wow64) has almost reached
> > feature completion, but there remains a pain point with OpenGL
> > applications: Namely that Wine can't return a 64-bit GL
> > implementation's buffer mappings to a 32 bit application when the
> > address is outside of the 32-bit range.
> >
> > Currently, we have a workaround that will copy any changes to the
> > mapping back to the host upon glBufferUnmap, but this of course is
> > slow when the implementation directly returns mapped memory, and
> > doesn't work for GL_PERSISTENT_BIT, where directly mapped memory
> > is required.
> >
> > A few years ago we also faced this problem with Vulkan's, which
> > was solved through the VK_EXT_map_memory_placed extension Faith
> > drafted, allowing us to use our Wine-internal allocator to provide
> > the pages the driver maps to. I'm now wondering if an GL
> > equivalent would also be seen as feasible amongst the devs here.
> >
> > Proposed solution:
> >
> > As the GL backend handles host mapping in its own code, only
> > giving suballocations from its mappings back to the App, the
> > problem is a little bit less straight forward in comparison to our
> > Vulkan solution: If we just allowed the application to set its own
> > placed mapping when calling glMapBuffer, the driver might then
> > have to handle moving buffers out of already mapped ranges, and
> > would lose control over its own memory management schemes.
> >
> > Therefore, I propose a GL extension that allows the GL client to
> > pr

Re: libdrm_amdgpu being forked and merged into Mesa

2024-10-24 Thread Christian König

Am 22.10.24 um 06:06 schrieb Marek Olšák:

Hi,

The MR is up:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31756

It's good to go as long as there is no functional issue.


Finally getting rid of all the mid-layering? What about shared VA-space?

Regards,
Christian.



libdrm_amdgpu will still be relevant for PAL, ROCm, and xf86-video-amdgpu.

Marek




Re: libdrm_amdgpu being forked and merged into Mesa

2024-10-24 Thread Christian König

Completely agree, but that's a platform decision which Alex needs to make.

Christian.

Am 24.10.24 um 14:16 schrieb Marek Olšák:
I don't think we need to share VA space. APIs usually share one or two 
buffers. That's almost nothing compared to the size of the occupied VA 
space. They also likely map them again for themselves - APIs don't 
share any virtual addresses as far as I know.


Marek

On Thu, Oct 24, 2024, 08:12 Christian König 
 wrote:


Am 22.10.24 um 06:06 schrieb Marek Olšák:
> Hi,
>
> The MR is up:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31756
>
> It's good to go as long as there is no functional issue.

Finally getting rid of all the mid-layering? What about shared
VA-space?

Regards,
Christian.

>
> libdrm_amdgpu will still be relevant for PAL, ROCm, and
xf86-video-amdgpu.
>
> Marek



Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications

2024-10-24 Thread Christian König

Darek we are unfortunately both partially right.

Linux supports cloning VMAs using mremap() from userspace by using a 
zero old size, but unfortunately only for SHM areas.


See the code in mm/mremap.c:
    /*
 * We allow a zero old-len as a special case
 * for DOS-emu "duplicate shm area" thing. But
 * a zero new-len is nonsensical.
 */
    if (!new_len)
    return ret;

Going to take a closer look to figure out what would be necessary to 
solve that for GPU drivers as well.


Regards,
Christian.

Am 24.10.24 um 14:56 schrieb Christian König:

I haven't tested it but as far as I know that isn't correct.

As far as I know you can map the same VMA at a different location even 
without MREMAP_DONTUNMAP. And yes MREMAP_DONTUNMAP only work with 
private mappings, but that isn't needed here.


Give me a moment to test this.

Regards,
Christian.

Am 24.10.24 um 10:03 schrieb Derek Lesho:
In my last mail I responded to this approach all the way at the 
bottom, so it probably got lost: mremap on Linux as it exists now 
won't work as it only supports private anonymous mappings (in 
conjunction with MREMAP_DONTUNMAP), which GPU mappings are not.


Am 10/24/24 um 01:06 schrieb James Jones:

That makes sense. Reading the man page myself, it does seem like:

-If the drivers can guarantee they set MAP_SHARED when creating 
their initial mapping.


-If WINE is fine rounding down to page boundaries to deal with 
mappings of suballocations and either using some lookup structure to 
avoid duplicate remappings (probably needed to handle unmap anyway 
per below) or just living with the perf cost and address space 
overconsumption for duplicate remappings.


-If mremap() preserves the cache attributes of the original mapping.

Then no GL API change would be needed. WINE would just have to do an 
if (addrAbove4G) { mremapStuff() } on map and presumably add some 
tracking to perform an equivalent munmap() when unmapping. I assume 
WINE already has a bunch of vaddr tracking logic in use to manage 
the <4G address space as described elsewhere in the thread. That 
would be pretty ideal from a driver vendor perspective.


Does that work?

Thanks,
-James

On 10/23/24 06:12, Christian König wrote:
I haven't read through the whole mail thread, but if you manage the 
address space using mmap() then you always run into this issue.


If you manage the whole 4GiB address space by Wine then you never 
run into this issue. You would just allocate some address range 
internally and mremap() into that.


Regards,
Christian.

Am 22.10.24 um 19:32 schrieb James Jones:
This sounds interesting, but does it come with the same "Only gets 
2GB VA" downside Derek pointed out in the thread fork where he was 
responding to Michel?


Thanks,
-James

On 10/22/24 07:14, Christian König wrote:

Hi guys,

one theoretical alternative not mentioned in this thread is the 
use of mremap().


In other words you reserve some address space below 2G by using 
mmap(NULL, length, PROT_NONE, MAP_32BIT | MAP_ANONYMOUS, 0, 0) 
and then use mremap(addr64bit, 0, length, MREMAP_FIXED, 
reserved_addr).


I haven't tested this but at least in theory it should give you a 
duplicate of the 64bit mapping in the lower 2G of the address space.


Important is that you give 0 as oldsize to mremap() so that the 
old mapping isn't unmapped but rather just a new mapping of the 
existing VMA created.


Regards,
Christian.


Am 18.10.24 um 23:55 schrieb Derek Lesho:

Hey everyone 👋,

I'm Derek from the Wine project, and wanted to start a 
discussion with y'all about potentially extending the Mesa OGL 
drivers to help us with a functionality gap we're facing.


Problem Space:

In the last few years Wine's support for running 32-bit windows 
apps in a 64-bit host environment (wow64) has almost reached 
feature completion, but there remains a pain point with OpenGL 
applications: Namely that Wine can't return a 64-bit GL 
implementation's buffer mappings to a 32 bit application when 
the address is outside of the 32-bit range.


Currently, we have a workaround that will copy any changes to 
the mapping back to the host upon glBufferUnmap, but this of 
course is slow when the implementation directly returns mapped 
memory, and doesn't work for GL_PERSISTENT_BIT, where directly 
mapped memory is required.


A few years ago we also faced this problem with Vulkan's, which 
was solved through the VK_EXT_map_memory_placed extension Faith 
drafted, allowing us to use our Wine-internal allocator to 
provide the pages the driver maps to. I'm now wondering if an GL 
equivalent would also be seen as feasible amongst the devs here.


Proposed solution:

As the GL backend handles host mapping in its own code, only 
giving suballocations from its mappings back to the App, the 
problem is a little bit less straight forward in comparison to 
our Vulkan solution: If we just allowed the application to set 
its own placed mapping when calling glMapBuffer, 

Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications

2024-10-24 Thread Derek Lesho
In my last mail I responded to this approach all the way at the bottom, 
so it probably got lost: mremap on Linux as it exists now won't work as 
it only supports private anonymous mappings (in conjunction with 
MREMAP_DONTUNMAP), which GPU mappings are not.


Am 10/24/24 um 01:06 schrieb James Jones:

That makes sense. Reading the man page myself, it does seem like:

-If the drivers can guarantee they set MAP_SHARED when creating their 
initial mapping.


-If WINE is fine rounding down to page boundaries to deal with 
mappings of suballocations and either using some lookup structure to 
avoid duplicate remappings (probably needed to handle unmap anyway per 
below) or just living with the perf cost and address space 
overconsumption for duplicate remappings.


-If mremap() preserves the cache attributes of the original mapping.

Then no GL API change would be needed. WINE would just have to do an 
if (addrAbove4G) { mremapStuff() } on map and presumably add some 
tracking to perform an equivalent munmap() when unmapping. I assume 
WINE already has a bunch of vaddr tracking logic in use to manage the 
<4G address space as described elsewhere in the thread. That would be 
pretty ideal from a driver vendor perspective.


Does that work?

Thanks,
-James

On 10/23/24 06:12, Christian König wrote:
I haven't read through the whole mail thread, but if you manage the 
address space using mmap() then you always run into this issue.


If you manage the whole 4GiB address space by Wine then you never run 
into this issue. You would just allocate some address range 
internally and mremap() into that.


Regards,
Christian.

Am 22.10.24 um 19:32 schrieb James Jones:
This sounds interesting, but does it come with the same "Only gets 
2GB VA" downside Derek pointed out in the thread fork where he was 
responding to Michel?


Thanks,
-James

On 10/22/24 07:14, Christian König wrote:

Hi guys,

one theoretical alternative not mentioned in this thread is the use 
of mremap().


In other words you reserve some address space below 2G by using 
mmap(NULL, length, PROT_NONE, MAP_32BIT | MAP_ANONYMOUS, 0, 0) and 
then use mremap(addr64bit, 0, length, MREMAP_FIXED, reserved_addr).


I haven't tested this but at least in theory it should give you a 
duplicate of the 64bit mapping in the lower 2G of the address space.


Important is that you give 0 as oldsize to mremap() so that the old 
mapping isn't unmapped but rather just a new mapping of the 
existing VMA created.


Regards,
Christian.


Am 18.10.24 um 23:55 schrieb Derek Lesho:

Hey everyone 👋,

I'm Derek from the Wine project, and wanted to start a discussion 
with y'all about potentially extending the Mesa OGL drivers to 
help us with a functionality gap we're facing.


Problem Space:

In the last few years Wine's support for running 32-bit windows 
apps in a 64-bit host environment (wow64) has almost reached 
feature completion, but there remains a pain point with OpenGL 
applications: Namely that Wine can't return a 64-bit GL 
implementation's buffer mappings to a 32 bit application when the 
address is outside of the 32-bit range.


Currently, we have a workaround that will copy any changes to the 
mapping back to the host upon glBufferUnmap, but this of course is 
slow when the implementation directly returns mapped memory, and 
doesn't work for GL_PERSISTENT_BIT, where directly mapped memory 
is required.


A few years ago we also faced this problem with Vulkan's, which 
was solved through the VK_EXT_map_memory_placed extension Faith 
drafted, allowing us to use our Wine-internal allocator to provide 
the pages the driver maps to. I'm now wondering if an GL 
equivalent would also be seen as feasible amongst the devs here.


Proposed solution:

As the GL backend handles host mapping in its own code, only 
giving suballocations from its mappings back to the App, the 
problem is a little bit less straight forward in comparison to our 
Vulkan solution: If we just allowed the application to set its own 
placed mapping when calling glMapBuffer, the driver might then 
have to handle moving buffers out of already mapped ranges, and 
would lose control over its own memory management schemes.


Therefore, I propose a GL extension that allows the GL client to 
provide a mapping and unmapping callback to the implementation, to 
be used whenever the driver needs to perform such operations. This 
way the driver remains in full control of its memory management 
affairs, and the amount of work for an implementation as well as 
potential for bugs is kept minimal. I've written a draft 
implementation in Zink using map_memory_placed [1] and a 
corresponding Wine MR utilizing it [2], and would be curious to 
hear your thoughts. I don't have experience in the Mesa codebase, 
so I apologize if the branch is a tad messy.


In theory, the only requirement from drivers from the extension 
would be that glMapBuffer always return a pointer from within a 
page allocated through the provided callbac