Re: [Mesa-dev] EGL_EXT_*_drm - primary vs render node (Was Re: [Piglit] [PATCH 1/2] egl: Add sanity test for EGL_EXT_device_query (v3))

2016-09-07 Thread James Jones

On 09/07/2016 04:18 AM, Emil Velikov wrote:

Hi Mathias,

On 6 September 2016 at 18:32, Mathias Fröhlich
 wrote:


 ** EGL_EXT_output_drm

Correction - the above should read: EGL_EXT_{device,output}_drm


 *** Using/exposing the card or render node
 - Extension is designed with EGL streams in mind (using the
primary/card node) while people expect to use to select the rendering
device.
 - Elaborate on the spec and/or introduce EGL_EXT_output{,_drm}_render ?
 *** Exposing EGL_EXT_output{,_drm}{,_render} on EGL implementations
supporting both SW and HW devices
 - Elaborate on the spec(s), add new one for SW devices and/or error
type to distinguish between the current errors and SW devices

I do not care about anything built on top of EGL_EXT_output_base or
EGL_*_stream_*. From my point of view this is beside.


What I do care about is EGL_EXT_platform_device.


That's precisely what, where and why we want to clarify, correct the
spec or add a new one.

James, Daniel, can we hear your input on the following ?

The way I read the spec(s) EGL_EXT_device_drm can effectively be
either the card/primary or render node, while EGL_EXT_output_drm must
be the card one.
Can/should we restrict the former to render only, do you see any
implications that will bring ?
Or should we just roll out another spec for the "render only" case ?


I had assumed EGL_EXT_device_drm's queries refer to the card/primary, 
and an additional extension could add a token to query the render node. 
When we initially started drafting the extensions, render nodes were 
just being introduced, and I considered adding them as a separate query 
later, but we had no need to identify the render nodes, so I demurred.


If that interpretation sounds OK, we can add corresponding 
clarifications to the specifications.


Thanks,
-James


Thanks
Emil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] EGL_EXT_*_drm - primary vs render node (Was Re: [Piglit] [PATCH 1/2] egl: Add sanity test for EGL_EXT_device_query (v3))

2016-09-08 Thread James Jones

On 09/08/2016 04:30 AM, Emil Velikov wrote:

On 7 September 2016 at 19:54, James Jones  wrote:

On 09/07/2016 04:18 AM, Emil Velikov wrote:


Hi Mathias,

On 6 September 2016 at 18:32, Mathias Fröhlich
 wrote:


 ** EGL_EXT_output_drm


Correction - the above should read: EGL_EXT_{device,output}_drm


 *** Using/exposing the card or render node
 - Extension is designed with EGL streams in mind (using the
primary/card node) while people expect to use to select the rendering
device.
 - Elaborate on the spec and/or introduce EGL_EXT_output{,_drm}_render ?
 *** Exposing EGL_EXT_output{,_drm}{,_render} on EGL implementations
supporting both SW and HW devices
 - Elaborate on the spec(s), add new one for SW devices and/or error
type to distinguish between the current errors and SW devices


I do not care about anything built on top of EGL_EXT_output_base or
EGL_*_stream_*. From my point of view this is beside.


What I do care about is EGL_EXT_platform_device.


That's precisely what, where and why we want to clarify, correct the
spec or add a new one.

James, Daniel, can we hear your input on the following ?

The way I read the spec(s) EGL_EXT_device_drm can effectively be
either the card/primary or render node, while EGL_EXT_output_drm must
be the card one.
Can/should we restrict the former to render only, do you see any
implications that will bring ?
Or should we just roll out another spec for the "render only" case ?



I had assumed EGL_EXT_device_drm's queries refer to the card/primary, and an
additional extension could add a token to query the render node. When we
initially started drafting the extensions, render nodes were just being
introduced, and I considered adding them as a separate query later, but we
had no need to identify the render nodes, so I demurred.


From a quick look at the EGL world (both desktop and embedded), looks
like the Nvidia drivers are the only ones implementing the extension.
Thus one can 'retrofit' the extension to target the render node. If
anyone is using EGL_EXT_device_drm yet relying on it to provide KMS
device then they're just doing/using it wrong ;-)


I'm not sure what you're implying here.  Is it:

1) Changing the behavior of the extension from what was intended, such 
that the existing query returns the render node path.


2) Making a backwards-compatible modification to the existing extension 
by adding an additional EGL token (e.g., EGL_DRM_RENDER_DEVICE_FILE_EXT) 
to query the render node path.


(1) won't work.  Even though we're the only one's shipping it, there's a 
lot of code (ours and customers') using it that relies on the current 
behavior, where EGL_DRM_DEVICE_FILE_EXT refers to the 
modesetting/primary file.


(2) could be done, but I prefer to avoid it whenever possible.  It 
leaves applications in the situation that they can never be sure the new 
functionality is present, since there is no query in EGL to 
differentiate between extension versions.  When running on older 
drivers, using the new query would fail.


Re-reading your email, I'm a bit worried I might have misunderstood what 
you were referring to, given you said "queries" (plural).  We are 
talking about the single query, EGL_DRM_DEVICE_FILE_EXT, right?  Or were 
you referring to something else [as well]?



If that interpretation sounds OK, we can add corresponding clarifications to
the specifications.


That said, if the above is not possible (please check before throwing
the towel) the extra text sounds OK.


Let me know what you think given the above.

Thanks,
-James


Thanks
Emil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] EGL_EXT_*_drm - primary vs render node (Was Re: [Piglit] [PATCH 1/2] egl: Add sanity test for EGL_EXT_device_query (v3))

2016-09-12 Thread James Jones

On 09/12/2016 07:00 AM, Emil Velikov wrote:

Hi James,

On 8 September 2016 at 17:27, Emil Velikov  wrote:


In order to clear any ambiguity in EGL_EXT_device_drm we need to
"s/DRM driver./DRM driver which support KMS./". With that small change
things should be fine.



Further to the above (trivial) clarification can we update the spec to
mention that correct extension the Status section ?
Namely: s/functionality in EXT_dispay_device/functionality in EXT_device_query/


Yes, I will include this in the updates.

Thanks,
-James


Thanks
Emil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa drivers/GLX and fork

2013-10-15 Thread James Jones

In case it helps, we have always interpreted the spec as follows:

-Forking duplicates process memory and FDs, including the display 
connection and the GLX context client-side state (and GL server state if 
direct rendering, but not X server state, such as the context XID), as 
messy as that is.


-A context can only be current to one thread of execution at a time

-Therefore, forking with a context current to the forking thread 
produces undefined behavior, since that would result in a context 
current in more than one process at the same time, and therefore more 
than one thread.  We try not to crash, but if the application ever calls 
into GL, GLX, or EGL again from this new thread, results may vary. 
Usually something crashes.


I would like to get some language into the GLX/EGL specs regarding this, 
or at the very least define some sort of expected behavior in the new GL 
ABI.


The driver-created offload thread is certainly another interesting 
wrinkle here.  However, I presume if the application was at least 
required to call some GLX (or EGL) function after the fork to set up a 
current context (As is the case in our interpretation of the spec), that 
would provide a good opportunity to re-initialize any driver state like 
internal threads?


Thanks,
-James

On 10/15/13 3:45 AM, Christian König wrote:

Hi Marek,

for the past few days I've been working on solving
https://bugs.freedesktop.org/show_bug.cgi?id=70123.

The basic problem is that compton forks itself into the background after
initializing the X server connection and GLX context. What happens now
is that only comptons main thread get's cloned by the fork, but not any
background thread created by the radeon winsys to offload the command
submission (that is common and very well known pthread behaviour).

I'm not 100% sure if comptons behaviour is valid or not and so if we
should try to fix it or not. On the one hand it should be transparent to
the application that we created another thread to offload some work from
the main thread, but on on the other hand dublicating the X connection
and GLX context with fork just screams for problems.

Any idea what the spec(s) say about this?

Thanks,
Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



nvpublic
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] glxinfo and indirect contexts on nvidia driver

2014-02-11 Thread James Jones

Hi Dave,

Adding { GLX_DRAWABLEL_TYPE, GLX_WINDOW_BIT } to the attrib list should 
work, and would technically be more correct since glxinfo is creating a 
window drawable.  32-bit float FBConfigs won't be able to render to X 
drawables (at least on our driver), so they don't have associated visuals.


Thanks,
-James

On 02/11/2014 07:16 PM, Dave Airlie wrote:

Hey,

so currently glxinfo with our libGL vs indirect nvidia GL fails to work,

The main reason is we get fbconfigs, our libGL picks the best fbconfig
as a 32-bit per channel RGBA, but then glXGetVisualFromFBConfig fails
on that since I assume it wants 8-bit per channel ones.

Now I can fix this to fallback to the old school, choose_xvisinfo and
glXCreateContext paths and I then at least get the extension info etc
printed, though I'm wondering if there is something else we should be
doing in the glxinfo fbconfig picking code?

Any ideas?

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] egl: implement EGL_MESA_transparent_alpha for x11 and wayland

2015-04-17 Thread James Jones

On 03/03/2015 11:05 AM, Daniel Stone wrote:

Hi,

On 3 March 2015 at 18:56, Jason Ekstrand  wrote:

On Tue, Mar 3, 2015 at 10:07 AM, Chad Versace 
wrote:

On 02/23/2015 06:32 AM, Jonny Lamb wrote:

+   static const EGLint argb_attrs[] = {
+   EGL_TRANSPARENT_TYPE, EGL_TRANSPARENT_ALPHA_MESA,
+   EGL_NONE
+   };



I tested this patch with X11 and Mesa, and it works as advertised. BUT...

Pre-patch, Wayland applications who requested an EGLConfig with alpha
received one. The EGLSurface had compositor-transparent alpha, which
the application may not have expected. But the alpha configs *were*
available.

Post-patch, existing Wayland applications that request an EGLConfig with
alpha will no longer receive one, because the existing applications
do not yet know about EGL_TRANSPARENT_ALPHA_MESA.

I believe a correct solution is to continue create EGLConfigs
with EGL_ALPHA and without EGL_TRANSPARENT_TYPE, as Mesa does today,
but tell the compositor to ignore the alpha channel. The alpha channel
will be used for client-side alpha-blending but not for compositing.
Jason says that should be possible by telling the compositor that
the buffer format is WL_DRM_FORMAT_XRGB. And in addition to the
existing
EGLConfigs, also create new EGLConfigs with EGL_TRANSPARENT_ALPHA_MESA, as
this patch already does.


Yes, XRGB would be the correct *technical* way to handle that, but it
still makes me wonder... What about Wayland apps that expect to get
transparency by simply asking for alpha now?  Won't this break them?  I
guess we have to break one class of apps or the other.


Yeah, it will break them. But then again, we had the same flag day for
X11 in that exact Bugzilla discussion, when X11 apps which requested
ALPHA_SIZE == 8 went from getting ARGB32 drawables which would be
blended by the compositor, to not - a change which was deemed totally
fine to enforce on people because it improved performance and matched
the spec.

Perhaps a better interim solution is to assume for Wayland that
EGL_TRANSPARENT_TYPE == EGL_DONT_CARE means that applications will get
a format determined by ALPHA_SIZE (i.e. size 8 means ARGB32, size 0
means XRGB32), but respect explicit demands for
TRANSPARENT_{NONE,ALPHA}.


FWIW, this extension was pointed out to me the other day, and I asked 
around NVIDIA for feedback on it.  We disagree with the interpretations 
of the spec that led to its development, so this and related prior 
changes will create a rift in implementation behavior.  We always have 
and will continue to support alpha blending of ARGB visuals/configs. 
We've recommended this usage to our customers and ISVs in the past, and 
have had example code explaining how to select visuals that enable 
translucency with composite in our driver documentation ever since we 
supported composite [1].  Additionally, we believe it is clear the 
EGL_TRANSPARENT_TYPE was never meant to affect alpha blended compositing 
of surfaces on the desktop.  It was meant for color-keying of overlay 
surfaces, which unlike blending, can't be expressed in the color 
channels of the config and hence requires a separate attribute.


If GLES 3.x made XRGB visuals unusable, then there must be a less 
invasive solution than eliminating them entirely and forcing this entire 
new workflow on every app, including those that could care less about 
GLES 3.  Perhaps GLES 3 should be fixed, or a less invasive change to 
GLX/EGL could be made.  For example, for years we allowed using RGBA 
32-bit GLX visuals (and EGLConfigs) with RGB 24-bit X visuals.  This 
allowed for 32-bit GLX usage while avoiding X composite alpha blending. 
 Ultimately we decided this subtly violated the spec.  Amending the EGL 
and GLX specs slightly to allow that usage and then reverting to the old 
behavior in our driver and introducing it in Mesa seems like a more 
reasonable way to resolve this.  Of course, that particular fix wouldn't 
work in Wayland, but there is probably more room for a slightly more 
invasive fix (if needed) there since the ecosystem is relatively young.


Thanks,
-James

[1] 
http://us.download.nvidia.com/XFree86/Linux-x86/96.43.16/README/appendix-s.html



Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] egl: implement EGL_MESA_transparent_alpha for x11 and wayland

2015-04-17 Thread James Jones

On 04/17/2015 04:08 PM, James Jones wrote:

On 03/03/2015 11:05 AM, Daniel Stone wrote:

Hi,

On 3 March 2015 at 18:56, Jason Ekstrand  wrote:

On Tue, Mar 3, 2015 at 10:07 AM, Chad Versace 
wrote:

On 02/23/2015 06:32 AM, Jonny Lamb wrote:

+   static const EGLint argb_attrs[] = {
+   EGL_TRANSPARENT_TYPE, EGL_TRANSPARENT_ALPHA_MESA,
+   EGL_NONE
+   };



I tested this patch with X11 and Mesa, and it works as advertised.
BUT...

Pre-patch, Wayland applications who requested an EGLConfig with alpha
received one. The EGLSurface had compositor-transparent alpha, which
the application may not have expected. But the alpha configs *were*
available.

Post-patch, existing Wayland applications that request an EGLConfig
with
alpha will no longer receive one, because the existing applications
do not yet know about EGL_TRANSPARENT_ALPHA_MESA.

I believe a correct solution is to continue create EGLConfigs
with EGL_ALPHA and without EGL_TRANSPARENT_TYPE, as Mesa does today,
but tell the compositor to ignore the alpha channel. The alpha channel
will be used for client-side alpha-blending but not for compositing.
Jason says that should be possible by telling the compositor that
the buffer format is WL_DRM_FORMAT_XRGB. And in addition to the
existing
EGLConfigs, also create new EGLConfigs with
EGL_TRANSPARENT_ALPHA_MESA, as
this patch already does.


Yes, XRGB would be the correct *technical* way to handle that,
but it
still makes me wonder... What about Wayland apps that expect to get
transparency by simply asking for alpha now?  Won't this break them?  I
guess we have to break one class of apps or the other.


Yeah, it will break them. But then again, we had the same flag day for
X11 in that exact Bugzilla discussion, when X11 apps which requested
ALPHA_SIZE == 8 went from getting ARGB32 drawables which would be
blended by the compositor, to not - a change which was deemed totally
fine to enforce on people because it improved performance and matched
the spec.

Perhaps a better interim solution is to assume for Wayland that
EGL_TRANSPARENT_TYPE == EGL_DONT_CARE means that applications will get
a format determined by ALPHA_SIZE (i.e. size 8 means ARGB32, size 0
means XRGB32), but respect explicit demands for
TRANSPARENT_{NONE,ALPHA}.


FWIW, this extension was pointed out to me the other day, and I asked
around NVIDIA for feedback on it.  We disagree with the interpretations
of the spec that led to its development, so this and related prior
changes will create a rift in implementation behavior.  We always have
and will continue to support alpha blending of ARGB visuals/configs.
We've recommended this usage to our customers and ISVs in the past, and
have had example code explaining how to select visuals that enable
translucency with composite in our driver documentation ever since we
supported composite [1].  Additionally, we believe it is clear the
EGL_TRANSPARENT_TYPE was never meant to affect alpha blended compositing
of surfaces on the desktop.  It was meant for color-keying of overlay
surfaces, which unlike blending, can't be expressed in the color
channels of the config and hence requires a separate attribute.

If GLES 3.x made XRGB visuals unusable, then there must be a less
invasive solution than eliminating them entirely and forcing this entire
new workflow on every app, including those that could care less about
GLES 3.  Perhaps GLES 3 should be fixed, or a less invasive change to
GLX/EGL could be made.  For example, for years we allowed using RGBA
32-bit GLX visuals (and EGLConfigs) with RGB 24-bit X visuals.  This
allowed for 32-bit GLX usage while avoiding X composite alpha blending.
  Ultimately we decided this subtly violated the spec.  Amending the EGL
and GLX specs slightly to allow that usage and then reverting to the old
behavior in our driver and introducing it in Mesa seems like a more
reasonable way to resolve this.  Of course, that particular fix wouldn't
work in Wayland, but there is probably more room for a slightly more
invasive fix (if needed) there since the ecosystem is relatively young.

Thanks,
-James

[1]
http://us.download.nvidia.com/XFree86/Linux-x86/96.43.16/README/appendix-s.html



Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



---

This email message is for the sole use of the intended recipient(s) and
may contain
confidential information.  Any unauthorized review, use, disclosure or
distribution
is prohibited.  If you are not the intended recipient, please contact
the sender by
reply email and destroy all copies of the original message.
---


Sorry about this.  I really hate our mail servers.  Neither this email 
nor the previous one are in fact confidential.  Ev

[Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-20 Thread James Jones
As many here know at this point, I've been working on solving issues 
related to DMA-capable memory allocation for various devices for some 
time now.  I'd like to take this opportunity to apologize for the way I 
handled the EGL stream proposals.  I understand now that the development 
process followed there was unacceptable to the community and likely 
offended many great engineers.


Moving forward, I attempted to reboot talks in a more constructive 
manner with the generic allocator library proposals & discussion forum 
at XDC 2016.  Some great design ideas came out of that, and I've since 
been prototyping some code to prove them out before bringing them back 
as official proposals.  Again, I understand some people are growing 
concerned that I've been doing this off on the side in a github project 
that has primarily NVIDIA contributors.  My goal was only to avoid 
wasting everyone's time with unproven ideas.  The intent was never to 
dump the prototype code as-is on the community and presume acceptance. 
It's just a public research project.


Now the prototyping is nearing completion, and I'd like to renew 
discussion on whether and how the new mechanisms can be integrated with 
the Linux graphics stack.


I'd be interested to know if more work is needed to demonstrate the 
usefulness of the new mechanisms, or whether people think they have 
value at this point.


After talking with people on the hallway track at XDC this year, I've 
heard several proposals for incorporating the new mechanisms:


-Include ideas from the generic allocator design into GBM.  This could 
take the form of designing a "GBM 2.0" API, or incrementally adding to 
the existing GBM API.


-Develop a library to replace GBM.  The allocator prototype code could 
be massaged into something production worthy to jump start this process.


-Develop a library that sits beside or on top of GBM, using GBM for 
low-level graphics buffer allocation, while supporting non-graphics 
kernel APIs directly.  The additional cross-device negotiation and 
sorting of capabilities would be handled in this slightly higher-level 
API before handing off to GBM and other APIs for actual allocation somehow.


-I have also heard some general comments that regardless of the 
relationship between GBM and the new allocator mechanisms, it might be 
time to move GBM out of Mesa so it can be developed as a stand-alone 
project.  I'd be interested what others think about that, as it would be 
something worth coordinating with any other new development based on or 
inside of GBM.


And of course I'm open to any other ideas for integration.  Beyond just 
where this code would live, there is much to debate about the mechanisms 
themselves and all the implementation details.  I was just hoping to 
kick things off with something high level to start.


For reference, the code Miguel and I have been developing for the 
prototype is here:


   https://github.com/cubanismo/allocator

And we've posted a port of kmscube that uses the new interfaces as a 
demonstration here:


   https://github.com/cubanismo/kmscube

There are still some proposed mechanisms (usage transitions mainly) that 
aren't prototyped, but I think it makes sense to start discussing 
integration while prototyping continues.


In addition, I'd like to note that NVIDIA is committed to providing open 
source driver implementations of these mechanisms for our hardware, in 
addition to support in our proprietary drivers.  In other words, 
wherever modifications to the nouveau kernel & userspace drivers are 
needed to implement the improved allocator mechanisms, we'll be 
contributing patches if no one beats us to it.


Thanks in advance for any feedback!

-James Jones
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/28] vulkan/wsi: Rework WSI to look a lot more like a layer

2017-11-20 Thread James Jones

On 11/16/2017 01:28 PM, Jason Ekstrand wrote:

This patch series is the combined brain-child of Dave and myself.  The
objective is to rewrite Vulkan WSI to look as much like a layer as possible
and to reduce the driver <-> WSI interface.  We try very hard to have as
many of the WSI details as possible in common code and to use standard
Vulkan interfaces for everything.  Among other things, this means that
prime support is now implemented in an entirely driver-agnostic way and the
driver doesn't even know it's happening.  As a side-effect anv now has
prime support.

Eventually, someone could pull what's left out into a proper layer and we
could drop WSI support from our drivers entirely.  There are a fiew pieces
of work that would be required to do this:

  1) Write all of the annoying layer bits.  There are some short-cuts that
 we can take because we're not in a layer and those will have to go.

  2) Define a VK_MESA_legacy_swapchain_image extension to replace the hack
 introduced in patch 8.

  3) It looks like modifiers support will land before the official Vulkan
 extensions get finished.  It will have to be ported to the official
 extensions.

  4) Figure out what to do about the fence in AcquireNextImage. In a future
 world of explicit synchronization, we can just import the sync_file
 from X or the Wayland compositor, but with implicit sync like we have
 today, it's a bit harder.  Right now, the helper in wsi_common does
 nothing with it and trusts the caller to handle it.

 The two drivers handle this differently today.  In anv, we do a dummy
 QueueSubmit to trigger the fence while radv triggers it a bit more
 manually.  In both cases, we trigger the fence immediately and trust in
 the GEM's implicit synchronization to sort things out for us.  We can't
 use the anv method as directly with radv because no queue is passed in
 so we don't know what queue to use in the dummy QueueSubmit.  (In ANV,
 we only have the one queue so that isn't a problem.)


I don't have detailed feedback, but I read through the series and this 
is pretty cool.  Glad things are starting to generalize across the 
driver stacks.  I'm optimistic that things have gotten to the point 
where we'll never have to write a separate Wayland WSI for our driver. 
Is it an accurate observation to say there aren't any Vulkan API bits 
missing (other than stuff in the pipeline like modifiers/dma-buf) to 
allow the full-layer solution?  Hopefully we haven't missed anything in 
the external_* extensions at this point.


Thanks,
-James


Dave, I tried to pull patches from your series where practical but, because
we did things in a different order, it frequently wasn't.  If you want to
claim credit for any of these patches, just say so and I'll --reset-author
on them.

The series can be found on freedesktop.org here:

https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/vulkan-wsi-prime


Cc: Dave Airlie 
Cc: Daniel Stone 
Cc: Chad Versace 
Cc: James Jones 

Daniel Stone (1):
   vulkan/wsi: Add a wsi_image structure

Dave Airlie (4):
   vulkan/wsi: use function ptr definitions from the spec.
   radv/wsi: drop allocate memory special case
   radv/wsi: Move the guts of QueuePresent to wsi common
   vulkan/wsi: move swapchain create/destroy to common code

Jason Ekstrand (23):
   vulkan/wsi/x11: Handle the geometry check earlier in create_swapchain
   vulkan/wsi: Add a wsi_device_init function
   vulkan/wsi: Add wsi_swapchain_init/finish functions
   vulkan/wsi: Implement prime in a completely generic way
   anv/image: Add a return value to bind_memory_plane
   vulkan/wsi: Add a mock image creation extension
   anv/image: Implement the wsi "extension"
   radv/image: Implement the wsi "extension"
   vulkan/wsi: Do image creation in common code
   vulkan/wsi: Add a WSI_FROM_HANDLE macro
   vulkan/wsi: Refactor result handling in queue_present
   vulkan/wsi: Only wait on semaphores on the first swapchain
   vulkan/wsi: Set a proper pWaitDstStageMask on the dummy submit
   anv/wsi: Use the common QueuePresent code
   anv/wsi: Enable prime support
   vulkan/wsi: Move get_images into common code
   vulkan/wsi: Move prime blitting into queue_present
   vulkan/wsi: Add a helper for AcquireNextImage
   vulkan/wsi: Move wsi_swapchain to wsi_common_private.h
   vulkan/wsi: Drop the can_handle_different_gpu parameter from
 get_support
   vulkan/wsi: Add wrappers for all of the surface queries
   vulkan/wsi: Drop some unneeded cruft from the API
   vulkan/wsi: Initialize individual WSI interfaces in wsi_device_init

  src/amd/vulkan/radv_device.c|  18 +-
  src/amd/vulkan/radv_image.c |  15 +-
  src/amd/vulkan/radv_private.h   |  10 -
  src/amd/vulkan/radv_wsi.c   | 472 +++---
  src/intel/vulkan/anv_image.c|  71 +++-
  src/intel/v

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread James Jones

On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:



On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark  wrote:


On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico  wrote:

Many of you may already know, but James is going to be out for a few
weeks and I'll be taking over this in the meantime.


Sorry for the unfortunate timing.  I am indeed on paternity leave at the 
moment.  Some quick comments below.  I'll be trying to follow the 
discussion as time allows while I'm out.



See inline for comments.

On Wed, 29 Nov 2017 09:33:29 -0800
Jason Ekstrand  wrote:
  

On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
  

On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:

On November 24, 2017 09:29:43 Rob Clark  wrote:



On Mon, Nov 20, 2017 at 8:11 PM, James Jones 

wrote:


As many here know at this point, I've been working on solving issues
related
to DMA-capable memory allocation for various devices for some time now.
I'd
like to take this opportunity to apologize for the way I handled the

EGL

stream proposals.  I understand now that the development process

followed

there was unacceptable to the community and likely offended many great
engineers.

Moving forward, I attempted to reboot talks in a more constructive

manner

with the generic allocator library proposals & discussion forum at XDC
2016.
Some great design ideas came out of that, and I've since been

prototyping

some code to prove them out before bringing them back as official
proposals.
Again, I understand some people are growing concerned that I've been
doing
this off on the side in a github project that has primarily NVIDIA
contributors.  My goal was only to avoid wasting everyone's time with
unproven ideas.  The intent was never to dump the prototype code as-is

on

the community and presume acceptance. It's just a public research
project.

Now the prototyping is nearing completion, and I'd like to renew
discussion
on whether and how the new mechanisms can be integrated with the Linux
graphics stack.

I'd be interested to know if more work is needed to demonstrate the
usefulness of the new mechanisms, or whether people think they have

value

at
this point.

After talking with people on the hallway track at XDC this year, I've
heard
several proposals for incorporating the new mechanisms:

-Include ideas from the generic allocator design into GBM.  This could
take
the form of designing a "GBM 2.0" API, or incrementally adding to the
existing GBM API.

-Develop a library to replace GBM.  The allocator prototype code could

be

massaged into something production worthy to jump start this process.

-Develop a library that sits beside or on top of GBM, using GBM for
low-level graphics buffer allocation, while supporting non-graphics
kernel
APIs directly.  The additional cross-device negotiation and sorting of
capabilities would be handled in this slightly higher-level API before
handing off to GBM and other APIs for actual allocation somehow.



tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc



I'm not quite some sure what I think about this.  I think I would like to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing on
it's own so we have two competing solutions.


I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
wh

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread James Jones

On 11/29/2017 01:10 PM, Rob Clark wrote:

On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand  wrote:

On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:


On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:

On November 24, 2017 09:29:43 Rob Clark  wrote:



On Mon, Nov 20, 2017 at 8:11 PM, James Jones 
wrote:


As many here know at this point, I've been working on solving issues
related
to DMA-capable memory allocation for various devices for some time
now.
I'd
like to take this opportunity to apologize for the way I handled the
EGL
stream proposals.  I understand now that the development process
followed
there was unacceptable to the community and likely offended many great
engineers.

Moving forward, I attempted to reboot talks in a more constructive
manner
with the generic allocator library proposals & discussion forum at XDC
2016.
Some great design ideas came out of that, and I've since been
prototyping
some code to prove them out before bringing them back as official
proposals.
Again, I understand some people are growing concerned that I've been
doing
this off on the side in a github project that has primarily NVIDIA
contributors.  My goal was only to avoid wasting everyone's time with
unproven ideas.  The intent was never to dump the prototype code as-is
on
the community and presume acceptance. It's just a public research
project.

Now the prototyping is nearing completion, and I'd like to renew
discussion
on whether and how the new mechanisms can be integrated with the Linux
graphics stack.

I'd be interested to know if more work is needed to demonstrate the
usefulness of the new mechanisms, or whether people think they have
value
at
this point.

After talking with people on the hallway track at XDC this year, I've
heard
several proposals for incorporating the new mechanisms:

-Include ideas from the generic allocator design into GBM.  This could
take
the form of designing a "GBM 2.0" API, or incrementally adding to the
existing GBM API.

-Develop a library to replace GBM.  The allocator prototype code could
be
massaged into something production worthy to jump start this process.

-Develop a library that sits beside or on top of GBM, using GBM for
low-level graphics buffer allocation, while supporting non-graphics
kernel
APIs directly.  The additional cross-device negotiation and sorting of
capabilities would be handled in this slightly higher-level API before
handing off to GBM and other APIs for actual allocation somehow.



tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc



I'm not quite some sure what I think about this.  I think I would like
to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing on
it's own so we have two competing solutions.


I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are thinking about implementing $new_thing for non-gpu
devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
about a (for ex.) camera.  I kinda don't want to throw out the baby
with the bathwater here.



Agreed.  GBM is very EGLish and we don't want the new allocator to be that.



*maybe* GBM could be partially implemented on top of $new_thing.  I
don't quite see how that would work.  Possibly we could deprecate
parts of GBM that are no longer need

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones

On 11/30/2017 10:48 AM, Rob Clark wrote:

On Thu, Nov 30, 2017 at 1:28 AM, James Jones  wrote:

On 11/29/2017 01:10 PM, Rob Clark wrote:


On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand 
wrote:


On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:



On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:


I'm not quite some sure what I think about this.  I think I would like
to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing
on
it's own so we have two competing solutions.



I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are thinking about implementing $new_thing for non-gpu
devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
about a (for ex.) camera.  I kinda don't want to throw out the baby
with the bathwater here.




Agreed.  GBM is very EGLish and we don't want the new allocator to be
that.



*maybe* GBM could be partially implemented on top of $new_thing.  I
don't quite see how that would work.  Possibly we could deprecate
parts of GBM that are no longer needed?  idk..  Either way, I fully
expect that GBM and mesa's implementation of $new_thing could perhaps
sit on to of some of the same set of internal APIs.  The public
interface can be decoupled from the internal implementation.




Maybe I should restate things a bit.  My real point was that modifiers +
$new_thing + Kernel blob should be a complete and more powerful
replacement
for GBM.  I don't know that we really can implement GBM on top of it
because
GBM has lots of wishy-washy concepts such as "cursor plane" which may not
map well at least not without querying the kernel about specifc display
planes.  In particular, I don't want someone to feel like they need to
use
$new_thing and GBM at the same time or together.  Ideally, I'd like them
to
never do that unless we decide gbm_bo is a useful abstraction for
$new_thing.



(just to repeat what I mentioned on irc)

I think main thing is how do you create a swapchain/surface and know
which is current front buffer after SwapBuffers()..  that is the only
bits of GBM that seem like there would still be useful.  idk, maybe
there is some other idea.



I don't view this as terribly useful except for legacy apps that need an EGL
window surface and can't be updated to use new methods.  Wayland compositors
certainly don't fall in that category.  I don't know that any GBM apps do.


kmscube doesn't count?  :-P

Hmm, I assumed weston and the other wayland compositors where still
using gbm to create EGL surfaces, but I confess to have not actually
looked at weston src code for quite a few years now.

Anyways, I think it is perfectly fine for GBM to stay as-is in it's
current form.  It can already import dma-buf fd's, and those can
certainly come from $new_thing.

So I guess we want an EGL extension to return the allocator device
instance for the GPU.  That also takes care of the non-bare-metal
case.


Rather, I think the way forward for the classes of apps that need something
like GBM or the generic allocator is more or less the path ChromeOS took
with their graphics architecture: Render to individual buffers (using FBOs
bound to imported buffers in GL) and manage buffer exchanges/blits manually.

The useful abstraction surfaces provide isn't so much deciding which buffer
is currently "front" and "back", but rather handling the transition/hand-off
to the window system/display device/etc. in SwapBuffers(), and the whole
idea of the allocator proposals is to make that something the application or
at least some non-driver utility library handles explicitly based on where
exactly the buffer is being handed off to.


Hmm, ok..  I guess the transition will need some hook into the driver.
For freedreno and vc4 (and I suspect this is not uncommon for tiler
GPUs), switching FBOs doesn't necessarily flush rendering to hw.
Maybe it would work out if you requested the sync fd file descriptor
from an EGL fence before passing things to next device, as that would
flush rendering.


This "flush" is exactly what usage transitions are for:

1) Perform rendering or texturing
2) Insert a transition into command stream using metadata extracted from 
allocator library into the rendering/texturing API using a new entry 
point.  This instructs the driver to perform any 
flushes/decompressions/etc. needed to transition to the next usage the 
pipeline.
3) Insert/extract your fence (potentially this is combined with

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones

On 11/30/2017 12:06 PM, Lyude Paul wrote:

On Thu, 2017-11-30 at 13:20 -0500, Rob Clark wrote:

On Thu, Nov 30, 2017 at 12:59 AM, James Jones  wrote:

On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:


On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark  wrote:


Do we need to define both in-place and copy transitions?  Ie. what if
GPU is still reading a tiled or compressed texture (ie. sampling from
previous frame for some reason), but we need to untile/uncompress for
display.. of maybe there are some other cases like that we should
think about..

Maybe you already have some thoughts about that?



This is the next thing I'll be working on. I haven't given it much
thought myself so far, but I think James might have had some insights.
I'll read through some of his notes to double-check.



A couple of notes on usage transitions:

While chatting about transitions, a few assertions were made by others
that
I've come to accept, despite the fact that they reduce the generality of
the
allocator mechanisms:

-GPUs are the only things that actually need usage transitions as far as I
know thus far.  Other engines either share the GPU representations of
data,
or use more limited representations; the latter being the reason non-GPU
usage transitions are a useful thing.

-It's reasonable to assume that a GPU is required to perform a usage
transition.  This follows from the above postulate.  If only GPUs are
using
more advanced representations, you don't need any transitions unless you
have a GPU available.


This seems reasonable.  I can't think of any non-gpu related case
where you would need a transition, other than perhaps cache flush/inv.


 From that, I derived the rough API proposal for transitions presented on
my
XDC 2017 slides.  Transition "metadata" is queried from the allocator
given
a pair of usages (which may refer to more than one device), but the
realization of the transition is left to existing GPU APIs.  I think I put
Vulkan-like pseudo-code in the slides, but the GL external objects
extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well.


I haven't quite wrapped my head around how this would work in the
cross-device case.. I mean from the API standpoint for the user, it
seems straightforward enough.  Just not sure how to implement that and
what the driver interface would look like.

I guess we need a capability-conversion (?).. I mean take for example
the the fb compression capability from your slide #12[1].  If we knew
there was an available transition to go from "Dev2 FB compression" to
"normal", then we could have allowed the "Dev2 FB compression" valid
set?

[1] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf


Regarding in-place Vs. copy: To me a transition is something that happens
in-place, at least semantically.  If you need to make copies, that's a
format conversion blit not a transition, and graphics APIs are already
capable of expressing that without any special transitions or help from
the
allocator.  However, I understand some chipsets perform transitions using
something that looks kind of like a blit using on-chip caches and
constrained usage semantics.  There's probably some work to do to see
whether those need to be accommodated as conversion blits or usgae
transitions.


I guess part of what I was thinking of, is what happens if the
producing device is still reading from the buffer.  For example,
viddec -> gpu use case, where the video decoder is also still hanging
on to the frame to use as a reference frame to decode future frames?

I guess if transition from devA -> devB can be done in parallel with
devA still reading the buffer, it isn't a problem.  I guess that
limits (non-blit) transitions to decompression and cache op's?  Maybe
that is ok..


I don't know of a real case it would be a problem.  Note you can 
transition to multiple usages in the proposed API, so for the video 
decoder example, you would transition from [video decode target] to 
[video decode target, GPU sampler source] for simultaneous texturing and 
reference frame usage.



For our hardware's purposes, transitions are just various levels of
decompression or compression reconfiguration and potentially cache
flushing/invalidation, so our transition metadata will just be some bits
signaling which compression operation is needed, if any.  That's the sort
of
operation I modeled the API around, so if things are much more exotic than
that for others, it will probably require some adjustments.




[snip]



Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my
primary goals.  However, it's a pretty heavy thing to prototype.  If
someone
has the time though, I think it would be a great experiment.  It would
help
flesh out the paltry list of usages, constraints, and capabilities in the
existing prototype codebase.  The kmscube example really should ha

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones

On 12/01/2017 10:34 AM, Nicolai Hähnle wrote:

On 01.12.2017 18:09, Nicolai Hähnle wrote:
[snip]

As for the actual transition API, I accept that some metadata may be
required, and the metadata probably needs to depend on the memory 
layout,

which is often vendor-specific. But even linear layouts need some
transitions for caches. We probably need at least some generic 
"off-device

usage" bit.


I've started thinking of cached as a capability with a transition.. I
think that helps.  Maybe it needs to somehow be more specific (ie. if
you have two devices both with there own cache with no coherency
between the two)


As I wrote above, I'd prefer not to think of "cached" as a capability 
at least for radeonsi.


 From the desktop perspective, I would say let's ignore caches, the 
drivers know which caches they need to flush to make data visible to 
other devices on the system.


On the other hand, there are probably SoC cases where non-coherent 
caches are shared between some but not all devices, and in that case 
perhaps we do need to communicate this.


So perhaps we should have two kinds of "capabilities".

The first, like framebuffer compression, is a capability of the 
allocated memory layout (because the compression requires a meta 
surface), and devices that expose it may opportunistically use it.


The second, like caches, is a capability that the device/driver will 
use and you don't get a say in it, but other devices/drivers also 
don't need to be aware of them.


So then you could theoretically have a system that gives you:

GPU: FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache)
Display: FOO/tiled(layout-caps=FOO/cc)
Video:   FOO/tiled(dev-caps=FOO/vid-cache)
Camera:  FOO/tiled(dev-caps=FOO/vid-cache)

[snip]

FWIW, I think all that stuff about different caches quite likely 
over-complicates things. At the end of each "command submission" of 
whichever type of engine, the buffer must be in a state where the kernel 
is free to move it around for memory management purposes. This already 
puts a big constraint on the kind of (non-coherent) caches that can be 
supported anyway, so I wouldn't be surprised if we could get away with a 
*much* simpler approach.


I'd rather not depend on this type of cleverness if possible.  Other 
kernels/OS's may not behave this way, and I'd like the allocator 
mechanism to be something we can use across all or at least most of the 
POSIX and POSIX-like OS's we support.  Also, this particular example is 
not true of our proprietary Linux driver, and I suspect it won't always 
be the case for other drivers.  If a particular driver or OS fits this 
assumption, the driver is always free to return no-op transitions in 
that case.


Thanks,
-James


Cheers,
Nicolai


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones

On 12/01/2017 01:52 PM, Miguel Angel Vico wrote:



On Fri, 1 Dec 2017 13:38:41 -0500
Rob Clark  wrote:


On Fri, Dec 1, 2017 at 12:09 PM, Nicolai Hähnle  wrote:

On 01.12.2017 16:06, Rob Clark wrote:


On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle 
wrote:


Hi,

I've had a chance to look a bit more closely at the allocator prototype
repository now. There's a whole bunch of low-level API design feedback,
but
for now let's focus on the high-level stuff first.


Thanks for taking a look.


Going by the 4.5 major object types (as also seen on slide 5 of your
presentation [0]), assertions and usages make sense to me.

Capabilities and capability sets should be cleaned up in my opinion, as
the
status quo is overly obfuscating things. What capability sets really
represent, as far as I understand them, is *memory layouts*, and so
that's
what they should be called.

This conceptually simplifies `derive_capabilities` significantly without
any
loss of expressiveness as far as I can see. Given two lists of memory
layouts, we simply look for which memory layouts appear in both lists,
and
then merge their constraints and capabilities.

Merging constraints looks good to me.

Capabilities need some more thought. The prototype removes capabilities
when
merging layouts, but I'd argue that that is often undesirable. (In fact,
I
cannot think of capabilities which we'd always want to remove.)

A typical example for this is compression (i.e. DCC in our case). For
rendering usage, we'd return something like:

Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)

For display usage, we might return (depending on hardware):

Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)

Merging these in the prototype would remove the DCC capability, even
though
it might well make sense to keep it there for rendering. Dealing withthe
fact that display usage does not have this capability is precisely one of
the two things that transitions are about! The other thing that
transitions
are about is caches.

I think this is kind of what Rob was saying in one of his mails.



Perhaps "layout" is a better name than "caps".. either way I think of
both AMD/tiled and AMD/DCC as the same type of "thing".. the
difference between AMD/tiled and AMD/DCC is that a transition can be
provided for AMD/DCC.  Other than that they are both things describing
the layout.



The reason that a transition can be provided is that they aren't quite the
same thing, though. In a very real sense, AMD/DCC is a "child" propertyof
AMD/tiled: DCC is implemented as a meta surface whose memory layout depends
on the layout of the main surface.


I suppose this is six-of-one, half-dozen of the other..

what you are calling a layout is what I'm calling a cap that just
happens not to have an associated transition


Although, if there are GPUs that can do an in-place "transition" between
different tiling layouts, then the distinction is perhaps really not as
clear-cut. I guess that would only apply to tiled renderers.


I suppose the advantage of just calling both layout and caps the same
thing, and just saying that a "cap" (or "layout" if you prefer that
name) can optionally have one or more associated transitions, is that
you can deal with cases where sometimes a tiled format might actually
have an in-place transition ;-)

  

So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

trans_a: FOO/CC -> null
trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
3: caps(FOO/tiled); constraints(alignment=32k)

Display:
1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
   transition(GPU->display: trans_a, trans_b; display->GPU: none)
2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
   transition(GPU->display: trans_a; display->GPU: none)
3: caps(FOO/tiled); constraints(alignment=64k);
   transition(GPU->display: none; display->GPU: none)



We definitely don't want to expose a way of getting uncached rendering
surfaces for radeonsi. I mean, I think we are supposed to be able to program
our hardware so that the backend bypasses all caches, but (a) nobody
validates that and (b) it's basically suicide in terms of performance. Let's
build fewer footguns :)


sure, this was just a hypothetical example.  But to take this case as
another example, if you didn't want to expose uncached rendering (or
cached w/ cache flushes after each draw), you would exclude the entry
from the GPU set which didn't have FOO/cached (I'm adding back a
cached but not CC config just to make it interesting), and end up
with:

trans_a: FOO/CC -> null
trans

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-06 Thread James Jones

On 12/06/2017 03:25 AM, Nicolai Hähnle wrote:

On 06.12.2017 08:07, James Jones wrote:
[snip]

So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

    trans_a: FOO/CC -> null
    trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
    1: caps(FOO/tiled, FOO/CC, FOO/cached); 
constraints(alignment=32k)

    2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
    3: caps(FOO/tiled); constraints(alignment=32k)

Display:
    1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
    1: caps(FOO/tiled, FOO/CC, FOO/cached); 
constraints(alignment=64k);

   transition(GPU->display: trans_a, trans_b; display->GPU: none)
    2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
   transition(GPU->display: trans_a; display->GPU: none)
    3: caps(FOO/tiled); constraints(alignment=64k);
   transition(GPU->display: none; display->GPU: none)



We definitely don't want to expose a way of getting uncached rendering
surfaces for radeonsi. I mean, I think we are supposed to be able 
to program

our hardware so that the backend bypasses all caches, but (a) nobody
validates that and (b) it's basically suicide in terms of 
performance. Let's

build fewer footguns :)


sure, this was just a hypothetical example.  But to take this case as
another example, if you didn't want to expose uncached rendering (or
cached w/ cache flushes after each draw), you would exclude the entry
from the GPU set which didn't have FOO/cached (I'm adding back a
cached but not CC config just to make it interesting), and end up
with:

    trans_a: FOO/CC -> null
    trans_b: FOO/cached -> null

GPU:
   1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
   2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k)

Display:
   1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
   1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
  transition(GPU->display: trans_a, trans_b; display->GPU: none)
   2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k);
  transition(GPU->display: trans_b; display->GPU: none)

So there isn't anything in the result set that doesn't have GPU cache,
and the cache-flush transition is always in the set of required
transitions going from GPU -> display

Hmm, I guess this does require the concept of a required cap..


Which we already introduced to the allocator API when we realized we
would need them as we were prototyping.


Note I also posed the question of whether things like cached (and 
similarly compression, since I view compression as roughly an 
equivalent mechanism to a cache) in one of the open issues on my XDC 
2017 slides because of this very problem of over-pruning it causes.  
It's on slide 15, as "No device-local capabilities".  You'll have to 
listen to my coverage of it in the recorded presentation for that 
slide to make any sense, but it's the same thing Nicolai has laid out 
here.


As I continued working through our prototype driver support, I found I 
didn't actually need to include cached or compressed as capabilities: 
The GPU just applies them as needed and the usage transitions make it 
transparent to the non-GPU engines.  That does mean the GPU driver 
currently needs to be the one to realize the allocation from the 
capability set to get optimal behavior.  We could fix that by 
reworking our driver though.  At this point, not including 
device-local properties like on-device caching in capabilities seems 
like the right solution to me.  I'm curious whether this applies 
universally though, or if other hardware doesn't fit the "compression 
and stuff all behaves like a cache" idiom.


Compression is a part of the memory layout for us: framebuffer 
compression uses an additional "meta surface". At the most basic level, 
an allocation with loss-less compression support is by necessity bigger 
than an allocation without.


We can allocate this meta surface separately, but then we're forced to 
decompress when passing the surface around (e.g. to a compositor.)


Consider also the example I gave elsewhere, where a cross-vendor tiling 
layout is combined with vendor-specific compression:


Device 1, rendering: caps(BASE/foo-tiling, VND1/compression)
Device 2, sampling/scanout: caps(BASE/foo-tiling, VND2/compression)

Some more thoughts on caching or "device-local" properties below.


Compression requires extra resources for us as well.  That's probably 
universal.  I think the distinction between the two approaches is 
whether the allocating driver deduces that compression can be used with 
a given capability set and hence adds the resources implicitly, or 
whether the capability 

Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

2017-12-21 Thread James Jones

On 12/20/2017 01:58 PM, Daniel Stone wrote:

Hi Miguel,

On 20 December 2017 at 16:51, Miguel Angel Vico  wrote:

In the meantime, I've been working on putting together an open source
implementation of the allocator mechanisms using the Nouveau driver for
all to be able to play with.


Thanks for taking a look at this! I'm still winding out my to-do list
for the year, but hoping to get to this more seriously soon.

As a general comment, now that modifiers are a first-class concept in
many places (KMS FBs, KMS plane format advertisement, V4L2 buffers,
EGL/Vulkan image import/export, Wayland buffer import, etc), I'd like
to see them included as a first-class concept in the allocator. I
understand one of the primary reservations against using them was that
QNX didn't have such a concept, but just specifying them to be ignored
on non-Linux platforms would probably work fine.


The allocator mechanisms and format modifiers are orthogonal though. 
Either capability sets can be represented using format modifiers (the 
direction one part of this thread is suggesting, which I think is a bad 
idea), or format modifiers could easily be included as a vendor-agnostic 
capability, similar to pitch layout.  There are no "First class 
citizens" in the allocator mechanism itself.  That's the whole idea: 
Apps don't need to care about things like how the OS represents its 
surface metadata beyond some truly universal things like width and 
height (assertions).  The rest is abstracted away such that the apps are 
portable, even if the drivers/backends aren't.  Even if the solution 
within Linux is "just use format modifiers", there's still some benefit 
to making the kernel ABI use something slightly higher level that 
translates to DRM format modifiers inside the kernel, just to keep the 
apps OS-agnostic.



Another of the missing pieces before we can move this to production is
importing allocations to DRM FB objects. This is probably one of the
most sensitive parts of the project as it requires modification/addition
of kernel driver interfaces.

At XDC2017, James had several hallway conversations with several people
about this, all having different opinions. I'd like to take this
opportunity to also start a discussion about what's the best option to
create a path to get allocator allocations added as DRM FB objects.

These are the few options we've considered to start with:

   A) Have vendor-private ioctls to set properties on GEM objects that
  are inherited by the FB objects. This is how our (NVIDIA) desktop
  DRM driver currently works. This would require every vendor to add
  their own ioctl to process allocator metadata, but the metadata is
  actually a vendor-agnostic object more like DRM modifiers. We'd
  like to come up with a vendor-agnostic solutions that can be
  integrated to core DRM.


This worries me. If the data is static for the lifetime of the buffer
- describing the tiling layout, for instance - then it would form
effective ABI for all the consumers/producers using that buffer type.
If it is dynamic, you also have a world of synchronisation problems
when multiple users race each other with different uses of that buffer
(and presumably you would need to reload the metadata on every use?).
Either way, anyone using this would need to have a very well-developed
compatibility story, given that you can mix and match kernel and
userspace versions.


I think the metadata is static.  The surface meta-state is not, but that 
would be a commit time thing if anything, not a GEM or FB object thing. 
Still attaching metadata to GEM objects, which seem to be opaque blobs 
of memory in the general case, rather than attaching it to FB's mapped 
onto the GEM objects always felt architecturally wrong to me.  You can 
have multiple FBs in one GEM object, for example.  There's no reason to 
assume they would share the same format let alone tiling layout.



   B) Add a new drmModeAddFBWithMetadata() command that takes allocator
  metadata blobs for each plane of the FB. Some people in the
  community have mentioned this is their preferred design. This,
  however, means we'd have to go through the exercise of adding
  another metadata mechanism to the whole graphics stack.


Similarly, this seems to be missing either a 'mandatory' flag so
userspace can inform the kernel it must fail if it does not understand
certain capabilities, or a way for the kernel to inform userspace
which capabilities it does/doesn't understand.


I think that will fall out of the discussion over exactly what 
capability sets look like.  Regardless, yes, the kernel must fail if it 
can't support a given capability set, just as it would fail if it 
couldn't support a given DRM Format modifier.  Like the format 
modifiers, the userspace allocator driver would have queried the DRM 
kernel driver when reporting supported capability sets for a usage that 
required creating FBs, so it would always be user error to r

Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

2018-01-03 Thread James Jones

On 12/28/2017 10:24 AM, Miguel Angel Vico wrote:

(Adding dri-devel back, and trying to respond to some comments from
the different forks)

James Jones wrote:


Your worst case analysis above isn't far off from our HW, give or take
some bits and axes here and there.  We've started an internal discussion
about how to lay out all the bits we need.  It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.


(thanks James for most of the info below)

To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.

We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.

Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.


Not clear if this is an NV-only term, so for those not familiar, page 
kind is very loosely the equivalent of a format modifier our HW uses 
internally in its memory management subsystem.  The value mappings vary 
a bit for each HW generation.



Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
bits.

If device-local properties are included, we might need a couple more
bits for caching.

We may also need to express locality information, which may take at
least another 2 or 3 bits.

If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.

So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.


Daniel Stone wrote:


So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.


I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?


Daniel Vetter wrote:


I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.


Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.


Rob Clark wrote:


It does seem like, if possible, starting out with modifiers for now at
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time.  Userspace APIs are
easier to change or throw away.  Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.


I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability s

Re: [Mesa-dev] [PATCH 2/2] nouveau: Enable support for EXT_external_objects

2018-06-22 Thread James Jones

FWIW,

Reviewed-by: James Jones 

For the series, but someone more familiar with Nouveau should probably 
review as well.


Thanks,
-James

On 06/21/2018 07:01 PM, Miguel A. Vico wrote:

Enable EXT_external_objects for nvc0 and nv50

Signed-off-by: Miguel A Vico Moya 
---
  src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 3a3c43b774..e5babd5580 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -201,6 +201,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
 case PIPE_CAP_TGSI_CLOCK:
 case PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX:
 case PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION:
+   case PIPE_CAP_MEMOBJ:
return 1;
 case PIPE_CAP_SEAMLESS_CUBE_MAP:
return 1; /* class_3d >= NVA0_3D_CLASS; */
@@ -273,7 +274,6 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
 case PIPE_CAP_BINDLESS_TEXTURE:
 case PIPE_CAP_NIR_SAMPLERS_AS_DEREF:
 case PIPE_CAP_QUERY_SO_OVERFLOW:
-   case PIPE_CAP_MEMOBJ:
 case PIPE_CAP_LOAD_CONSTBUF:
 case PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS:
 case PIPE_CAP_TILE_RASTER_ORDER:
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 02890c7165..ce344e33c5 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -259,6 +259,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
 case PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX:
 case PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION:
 case PIPE_CAP_QUERY_SO_OVERFLOW:
+   case PIPE_CAP_MEMOBJ:
return 1;
 case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
return nouveau_screen(pscreen)->vram_domain & NOUVEAU_BO_VRAM ? 1 : 0;
@@ -309,7 +310,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
 case PIPE_CAP_INT64_DIVMOD:
 case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
 case PIPE_CAP_NIR_SAMPLERS_AS_DEREF:
-   case PIPE_CAP_MEMOBJ:
 case PIPE_CAP_LOAD_CONSTBUF:
 case PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS:
 case PIPE_CAP_TILE_RASTER_ORDER:



nvpublic
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

2018-02-26 Thread James Jones

On 02/22/2018 01:16 PM, Alex Deucher wrote:

On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
 wrote:

On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg  wrote:

On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher  wrote:


On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace 

wrote:

On Thu 21 Dec 2017, Daniel Vetter wrote:

On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <

hoegsb...@google.com> wrote:

On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <

mvicom...@nvidia.com> wrote:

On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <

hoegsb...@gmail.com> wrote:

I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.


The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.


I understand that you may have n knobs with a total of more than a

total of

56 bits that configure your tiling/swizzling for color buffers. What

I don't

buy is that you need all those combinations when passing buffers

around

between codecs, cameras and display controllers. Even if you're

sharing

between the same 3D drivers in different processes, I expect just

locking

down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.


I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.

I summarized this opinion in VK_EXT_image_drm_format_modifier,
where I wrote an "introdution to modifiers" section. Here's an excerpt:

 One goal of modifiers in the Linux ecosystem is to enumerate for

each

 vendor a reasonably sized set of tiling formats that are

appropriate for

 images shared across processes, APIs, and/or devices, where each
 participating component may possibly be from different vendors.
 A non-goal is to enumerate all tiling formats supported by all

vendors.

 Some tiling formats used internally by vendors are inappropriate for
 sharing; no modifiers should be assigned to such tiling formats.



Where it gets tricky is how to select that subset?  Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself.  At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)



I guess we could do something like:
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK


AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

etc.



We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective.  All of the parameters
affect the alignment requirements.


We discussed this earlier in t

[Mesa-dev] [RFC] GBM Alternate Backend Discovery & Loading

2021-03-29 Thread James Jones
I've brought this up in the past, but I have some patches implementing 
it now, so I was hoping to get some further feedback on the idea of 
supporting GBM backends external to Mesa.


https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9902

From my reading, the GBM core code was intended to support, or perhaps 
at some point in the past did support loading backends other than the 
built-in DRI one. This series extends the current code to enable loading 
of alternate backends in general, and as a straw-man proposal, 
specifically uses the DRM driver name of the FD passed to 
gbm_create_device() to construct a library name of a potential alternate 
backend. Other schemes could be implemented on top of the basic 
functionality added here, but this simple one seemed sufficient.


Issues addressed in the series:

-The ability to dynamically load backends from DSOs.

-Thread-safe tracking and refcounting of dynamically-loaded backends.

-Versioning of the GBM driver ABI to enable backwards and forwards 
binary compatibility between the GBM loader and backends.


-Discovery of alternate/external backend DSOs.

Here's a rundown of the backend discovery logic implemented in the series:

-If the GBM_BACKEND env var is set, attempt to load the backend library 
it names and create a device from it if found.


-If that fails or the env var is not set, call drmGetVersion() on the 
fd, and if that succeeds, attempt to load libgbm_.so.0 
and create a device from it if found.


-If that fails, try the built-in DRI backend.

Thanks,
-James
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: Xbox Series S/X UWP

2022-06-13 Thread James Jones

On 6/6/22 09:22, Jesse Natalie wrote:
(Hopefully this goes through and not to spam like last time I tried to 
respond…)


No, neither of these would currently work with UWP.

The primary reason is that neither Khronos API has extensions to 
initialize the winsys on top of the UWP core window infrastructure. In 
theory, you could initialize Dozen for offscreen rendering and then 
explicitly marshal the contents out – that would probably work actually. 
There’s 2 more gotchas there though:


 1. The ICD loaders (OpenGL32.dll, Vulkan-1.dll) are not available in
the UWP environment. You could explicitly use the non-ICD version of
GL (i.e. Mesa’s OpenGL32.dll from the libgl-gdi target), include the
open-source Vulkan ICD loader, or use the ICD version of either
(OpenGLOn12.dll/libgallium_wgl.dll for GL – I plan to delete the
former at some point and just use the latter at some point;
vulkan_dzn.dll for VK).
 2. There’s not currently extensions for D3D12 interop either spec’d or
implemented.


What do you mean by not spec'd? Vulkan and OpenGL both have standard 
(KHR and EXT respectively) D3D12 interop extensions, in addition to 
Vulkan<->GL, Vulkan<->D3D11-and-lower.


Thanks,
-James

There’s one more problem for GL that I don’t think is problematic for 
VK, which is that it uses APIs that are banned from the UWP environment, 
specifically around inserting window hooks for Win32 framebuffer 
lifetime management. So you’d probably have to build a custom version 
that has all of that stuff stripped out to get it to be shippable in a UWP.


We (Microsoft) don’t really have plans to add this kind of stuff, at 
least not in the near future, but I’d be open to accepting contributions 
that enable this.


-Jesse

*From:* mesa-dev  *On Behalf Of 
*Daniel Price

*Sent:* Monday, June 6, 2022 5:41 AM
*To:* mesa-dev@lists.freedesktop.org
*Subject:* [EXTERNAL] Xbox Series S/X UWP



You don't often get email from riverpr...@hotmail.com 
. Learn why this is important 





Hi, I was wandering if these two layers would work with UWP on Xbox 
Series Console or if not will there be plans to add support?


https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14766 



https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14881 



Many Thanks

Dan



Re: Headless OpenGL Rendering using SSH and X11 forwarding

2023-03-14 Thread James Jones
If that's the case, yes, I can confirm the NV driver does not support 
rendering with remote X servers using EGL, with or without indirect GLX 
support enabled in said server, and yes, EGLDevice will work just fine 
in that situation for offscreen rendering if you're trying to use the 
local GPU.


Thanks,
-James

On 3/13/23 18:27, Adam Jackson wrote:
12290 is indeed EGL_BAD_ACCESS, and it's pretty much impossible for 
Mesa's eglInitialize to return that, so (if I had to guess) you have 
nvidia's driver installed as well, and (extra guessing now) nvidia's EGL 
only works with connections to the local machine and not over the 
network. Mesa shouldn't have that problem because it would select 
llvmpipe instead of a native driver in that scenario, I think.


If "render to png" really is what you're trying to accomplish you might 
do better to use EGL_EXT_platform_device to get a direct connection to 
the GPU without involving a display server.


- ajax

On Tue, Mar 7, 2023 at 8:17 PM Richard Haney > wrote:


Please help,

I have been going around and around with this problem but cannot
seem to make any headway. I hope that one of you OpenGL EGL experts
can help.:slight_smile:

I have created a program that uses OpenGL EGL (version 1.5) with
OpenGL 3 that successfully renders an offscreen triangle and saves
it to an image file (PNG) when Issh/without/X11 forwarding on my
Linux (Ubuntu 22.04) machine.

However when I try the same thing usingssh/with/X11 forwarding
enabled I get the following EGL error when I calleglInitialize(…):
12290 (I/think/isEGL_BAD_ACCESS).

This seems really weird and I hope it is something simple that I am
just not currently seeing.

I really like using OpenGL with EGL but need a way to remedy this
situation if possible. Is there a way for EGL to determine if X11
forwarding is being employed and to ignore it or some other solution?

The snippet of relevant C++ code follows, with area where error
occurs marked:

#include #include #include #define
EGL_EGLEXT_PROTOTYPES#define GL_GLEXT_PROTOTYPES#include
#include ... EGLDisplay display =
eglGetDisplay(EGL_DEFAULT_DISPLAY); if(display == EGL_NO_DISPLAY) {
std::cerr << "Failed to get EGL display: "<< eglGetError() <<
std::endl; exit(EXIT_FAILURE); } EGLint major; EGLint minor;
if(eglInitialize(display, &major, &minor) == EGL_FALSE) { // ERROR
12290 is generated herestd::cerr << "Failed to initialize EGL: "<<
eglGetError() << std::endl; exit(EXIT_FAILURE); } ...


Any help would be greatly appreciated.



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-24 Thread James Jones

FWIW, the NVIDIA binary driver's implementation of gbm_bo_map/unmap()

1) Don't do any synchronization against in-flight work. The assumption 
is that if the content is going to be read, the API writing the data has 
established that coherence. Likewise, if it's going to be written, the 
API reading it afterwards does any invalidates or whatever are needed 
for coherence.


2) We don't blit anything or format convert, because our GBM 
implementation has no DMA engine access, and I'd like to keep it that 
way. Setting up a DMA-capable driver instance is much more expensive as 
far as runtime resources than setting up a simple allocator+mmap driver, 
at least in our driver architecture. Our GBM map just does an mmap(), 
and if it's not linear, you're not going to be able to interpret the 
data unless you've read up on our tiling formats. I'm aware this is 
different from Mesa, and no one has complained thus far. If we were 
forced to fix it, I imagine we'd do something like ask a shared engine 
in the kernel to do the blit on userspace's behalf, which would probably 
be slow but save resources.


Basically, don't use gbm_bo_map() for anything non-trivial on our 
implementation. It's not the right tool for e.g., reading back or 
populating OpenGL textures or X pixmaps. If you don't want to run on the 
NV implementation, feel free to ignore this advice, but I'd still 
suggest it's not the best tool for most jobs.


Thanks,
-James

On 6/17/24 03:29, Pierre Ossman wrote:

On 17/06/2024 10:13, Christian König wrote:


Let me try to clarify a couple of things:

The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so 
that the GPU can see values written by the CPU and the CPU can see 
values written by the GPU. But that IOCTL does *not* wait for any 
async GPU operation to finish.


If you want to wait for async GPU operations you either need to call 
the OpenGL functions to read pixels or do a select() (or poll, epoll 
etc...) call on the DMA-buf file descriptor.




Thanks for the clarification!

Just to avoid any uncertainty, are both of these things done implicitly 
by gbm_bo_map()/gbm_bo_unmap()?


I did test adding those steps just in case, but unfortunately did not 
see an improvement. My order was:


1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, &fds, NULL, NULL, NULL)
6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | 
DMA_BUF_SYNC_READ })

7. pixman_blt()
8. gbm_bo_unmap()

So if you want to do some rendering with OpenGL and then see the 
result in a buffer memory mapping the correct sequence would be the 
following:


1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the 
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the 
rendering to complete.

4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.



What I want to do is implement the X server side of DRI3 in just CPU. It 
works for every application I've tested except gnome-shell.


I would assume that 1. and 2. are supposed to be done by the X client, 
i.e. gnome-shell?


What I need to be able to do is access the result of that, once the X 
client tries to draw using that GBM backed pixmap (e.g. using 
PresentPixmap).


So far, we've only tested Intel GPUs, but we are setting up Nvidia and 
AMD GPUs at the moment. It will be interesting to see if the issue 
remains on those or not.


Regards


[Mesa-dev] [PATCH] gallium: Add format modifier aux plane query

2020-02-05 Thread James Jones
Rather than hard-code a list of all the format
modifiers supported by any gallium driver in
the dri state tracker, add a screen proc that
queries the number of auxiliary planes required
for a given modifier+format pair.

Since the only format modifier that requires
auxiliary planes currently is the iris driver's
I915_FORMAT_MOD_Y_TILED_CCS, provide a generic
implementation of this screen proc as a utility
function, and use that in every driver besides
the iris driver, which requires a trivial
customization on top of it.

Signed-off-by: James Jones 
---
 src/gallium/auxiliary/util/u_screen.c | 35 ++
 src/gallium/auxiliary/util/u_screen.h |  7 
 src/gallium/drivers/etnaviv/etnaviv_screen.c  |  1 +
 .../drivers/freedreno/freedreno_screen.c  |  1 +
 src/gallium/drivers/iris/iris_resource.c  | 17 +
 src/gallium/drivers/lima/lima_screen.c|  1 +
 .../drivers/nouveau/nvc0/nvc0_resource.c  |  2 ++
 src/gallium/drivers/tegra/tegra_screen.c  | 12 +++
 src/gallium/drivers/v3d/v3d_screen.c  |  1 +
 src/gallium/drivers/vc4/vc4_screen.c  |  1 +
 src/gallium/include/pipe/p_screen.h   | 15 
 src/gallium/state_trackers/dri/dri2.c | 36 ---
 12 files changed, 107 insertions(+), 22 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_screen.c 
b/src/gallium/auxiliary/util/u_screen.c
index 785d1bd3e24..0697d483372 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -412,3 +412,38 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
*pscreen,
   unreachable("bad PIPE_CAP_*");
}
 }
+
+bool
+u_pipe_screen_get_modifier_aux_planes(struct pipe_screen *pscreen,
+  uint64_t modifier,
+  enum pipe_format format,
+  unsigned *num_aux_planes)
+{
+   int num_mods, i;
+   uint64_t *supported_mods;
+
+   pscreen->query_dmabuf_modifiers(pscreen, format, 0, NULL, NULL,
+   &num_mods);
+
+   if (!num_mods)
+  return false;
+
+   supported_mods = malloc(num_mods * sizeof(supported_mods[0]));
+
+   if (!supported_mods)
+  return false;
+
+   pscreen->query_dmabuf_modifiers(pscreen, format, num_mods, supported_mods,
+   NULL, &num_mods);
+
+   for (i = 0; i < num_mods && supported_mods[i] != modifier; i++);
+
+   free(supported_mods);
+
+   if (i == num_mods)
+  return false;
+
+   *num_aux_planes = 0;
+
+   return true;
+}
diff --git a/src/gallium/auxiliary/util/u_screen.h 
b/src/gallium/auxiliary/util/u_screen.h
index 3952a11f2ca..0abcfd282b1 100644
--- a/src/gallium/auxiliary/util/u_screen.h
+++ b/src/gallium/auxiliary/util/u_screen.h
@@ -23,6 +23,7 @@
 
 struct pipe_screen;
 enum pipe_cap;
+enum pipe_format;
 
 #ifdef __cplusplus
 extern "C" {
@@ -32,6 +33,12 @@ int
 u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
  enum pipe_cap param);
 
+bool
+u_pipe_screen_get_modifier_aux_planes(struct pipe_screen *pscreen,
+  uint64_t modifier,
+  enum pipe_format format,
+  unsigned *num_aux_planes);
+
 #ifdef __cplusplus
 };
 #endif
diff --git a/src/gallium/drivers/etnaviv/etnaviv_screen.c 
b/src/gallium/drivers/etnaviv/etnaviv_screen.c
index dcceddc4729..32909a4e5ea 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_screen.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_screen.c
@@ -1019,6 +1019,7 @@ etna_screen_create(struct etna_device *dev, struct 
etna_gpu *gpu,
pscreen->context_create = etna_context_create;
pscreen->is_format_supported = etna_screen_is_format_supported;
pscreen->query_dmabuf_modifiers = etna_screen_query_dmabuf_modifiers;
+   pscreen->get_modifier_aux_planes = u_pipe_screen_get_modifier_aux_planes;
 
etna_fence_screen_init(pscreen);
etna_query_screen_init(pscreen);
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 3c0ed69a9cb..5d25df02ebf 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -984,6 +984,7 @@ fd_screen_create(struct fd_device *dev, struct renderonly 
*ro)
pscreen->fence_get_fd = fd_fence_get_fd;
 
pscreen->query_dmabuf_modifiers = fd_screen_query_dmabuf_modifiers;
+   pscreen->get_modifier_aux_planes = 
u_pipe_screen_get_modifier_aux_planes;
 
if (!screen->supported_modifiers) {
static const uint64_t supported_modifiers[] = {
diff --git a/src/gallium/drivers/iris/iris_resource.c 
b/src/gallium/drivers/iris/iris_resource.c
index bdd715df2c9..a3b0e87070f 100644
--- a/src/gallium/drivers/iris/iris_resource.c
+++ b/src/gallium/dri

[Mesa-dev] [PATCH 0/5] nouveau: Improved format modifier support

2020-02-05 Thread James Jones
This series pulls in the proposed
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() format
modifier macro and wires it up in the nouveau
nvc0 driver.  In doing so, it improves the
existing format modifier code to behave more
like other format-modifier-capable drivers, and
is written in such a way that it should be easier
to port to nv50-class and future turing-class
drivers as well.

Modifiers supporting import/export of compressed
surfaces are not included in this series.  Once
the general approach here is agreed upon, I can
send out a follow-on series adding those as well.

This series depends on the general gallium/dri
cleanup patch:

  [PATCH] gallium: Add format modifier aux plane query

Which was sent out separately.

James Jones (5):
  drm-uapi: Update headers from nouveau/linux-5.6
  nouveau: Stash supported sector layout in screen
  nouveau: Use DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D
  nouveau: no modifier != the invalid modifier
  nouveau: Use format modifiers in buffer allocation

 include/drm-uapi/drm_fourcc.h | 135 +++-
 src/gallium/drivers/nouveau/nouveau_screen.c  |  12 +
 src/gallium/drivers/nouveau/nouveau_screen.h  |   1 +
 .../drivers/nouveau/nvc0/nvc0_miptree.c   | 208 --
 .../drivers/nouveau/nvc0/nvc0_resource.c  |  41 ++--
 .../drivers/nouveau/nvc0/nvc0_resource.h  |   5 +
 6 files changed, 306 insertions(+), 96 deletions(-)

-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] nouveau: Use DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D

2020-02-05 Thread James Jones
Replace existing usage of the NVIDIA_16BX2_BLOCK
format modifiers with parameterized use of the
more general macro.  Nouveau will now report
support for slightly different modifiers depending
on whether the underlying chip is a tegra GPU or
not, and will potentially report valid format
modifiers for more resource types, but overall
this should be a functional no-op for existing
applications.

Signed-off-by: James Jones 
---
 .../drivers/nouveau/nvc0/nvc0_miptree.c   | 99 ---
 .../drivers/nouveau/nvc0/nvc0_resource.c  | 37 ---
 .../drivers/nouveau/nvc0/nvc0_resource.h  |  5 +
 3 files changed, 64 insertions(+), 77 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
index c897e4e8b97..20e4c4decb1 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
@@ -37,19 +37,14 @@ nvc0_tex_choose_tile_dims(unsigned nx, unsigned ny, 
unsigned nz, bool is_3d)
return nv50_tex_choose_tile_dims_helper(nx, ny, nz, is_3d);
 }
 
-static uint32_t
-nvc0_mt_choose_storage_type(struct nv50_miptree *mt, bool compressed)
+uint32_t
+nvc0_choose_tiled_storage_type(enum pipe_format format,
+   unsigned ms,
+   bool compressed)
 {
-   const unsigned ms = util_logbase2(mt->base.base.nr_samples);
-
uint32_t tile_flags;
 
-   if (unlikely(mt->base.base.bind & PIPE_BIND_CURSOR))
-  return 0;
-   if (unlikely(mt->base.base.flags & NOUVEAU_RESOURCE_FLAG_LINEAR))
-  return 0;
-
-   switch (mt->base.base.format) {
+   switch (format) {
case PIPE_FORMAT_Z16_UNORM:
   if (compressed)
  tile_flags = 0x02 + ms;
@@ -86,7 +81,7 @@ nvc0_mt_choose_storage_type(struct nv50_miptree *mt, bool 
compressed)
  tile_flags = 0xc3;
   break;
default:
-  switch (util_format_get_blocksizebits(mt->base.base.format)) {
+  switch (util_format_get_blocksizebits(format)) {
   case 128:
  if (compressed)
 tile_flags = 0xf4 + ms * 2;
@@ -136,6 +131,19 @@ nvc0_mt_choose_storage_type(struct nv50_miptree *mt, bool 
compressed)
return tile_flags;
 }
 
+static uint32_t
+nvc0_mt_choose_storage_type(struct nv50_miptree *mt, bool compressed)
+{
+   const unsigned ms = util_logbase2(mt->base.base.nr_samples);
+
+   if (unlikely(mt->base.base.bind & PIPE_BIND_CURSOR))
+  return 0;
+   if (unlikely(mt->base.base.flags & NOUVEAU_RESOURCE_FLAG_LINEAR))
+  return 0;
+
+   return nvc0_choose_tiled_storage_type(mt->base.base.format, ms, compressed);
+}
+
 static inline bool
 nvc0_miptree_init_ms_mode(struct nv50_miptree *mt)
 {
@@ -236,57 +244,32 @@ nvc0_miptree_init_layout_tiled(struct nv50_miptree *mt)
}
 }
 
-static uint64_t nvc0_miptree_get_modifier(struct nv50_miptree *mt)
+static uint64_t
+nvc0_miptree_get_modifier(struct pipe_screen *pscreen, struct nv50_miptree *mt)
 {
-   union nouveau_bo_config *config = &mt->base.bo->config;
-   uint64_t modifier;
+   const union nouveau_bo_config *config = &mt->base.bo->config;
+   const uint32_t uc_kind =
+  nvc0_choose_tiled_storage_type(mt->base.base.format,
+ mt->base.base.nr_samples,
+ false);
 
if (mt->layout_3d)
   return DRM_FORMAT_MOD_INVALID;
+   if (mt->base.base.nr_samples > 1)
+  return DRM_FORMAT_MOD_INVALID;
+   if (config->nvc0.memtype == 0x00)
+  return DRM_FORMAT_MOD_LINEAR;
+   if (NVC0_TILE_MODE_Y(config->nvc0.tile_mode) > 5)
+  return DRM_FORMAT_MOD_INVALID;
+   if (config->nvc0.memtype != uc_kind)
+  return DRM_FORMAT_MOD_INVALID;
 
-   switch (config->nvc0.memtype) {
-   case 0x00:
-  modifier = DRM_FORMAT_MOD_LINEAR;
-  break;
-
-   case 0xfe:
-  switch (NVC0_TILE_MODE_Y(config->nvc0.tile_mode)) {
-  case 0:
- modifier = DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB;
- break;
-
-  case 1:
- modifier = DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB;
- break;
-
-  case 2:
- modifier = DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB;
- break;
-
-  case 3:
- modifier = DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB;
- break;
-
-  case 4:
- modifier = DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB;
- break;
-
-  case 5:
- modifier = DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB;
- break;
-
-  default:
- modifier = DRM_FORMAT_MOD_INVALID;
- break;
-  }
-  break;
-
-   default:
-  modifier = DRM_FORMAT_MOD_INVALID;
-  break;
-   }
-
-   return modifier;
+   return DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(
+ 0,
+ nouveau_screen(pscreen)->tegra_sector_layout ? 0 : 1,
+ 0,
+ config->nvc0.memtype,
+ NVC0_TILE

[Mesa-dev] [PATCH 4/5] nouveau: no modifier != the invalid modifier

2020-02-05 Thread James Jones
Other drivers fail resource allocation when a list
of modifiers for the resource is provided but none
are supported. This includes cases when the never-
supported DRM_FORMAT_MOD_INVALID modifier is
explicitly passed.  To enable matching that
functionality in nouveau, use an empty modifier
list rather than creating a one-entry list
containing only DRM_FORMAT_MOD_INVALID when the
non-modifier resource creation function is used.

This change stops short of failing allocations
when no modifier is specified, because the current
code ignores all modifiers except the linear
modifier when creating resources, so there is not
yet a framework in place to determine which
modifiers are valid for a given resource creation
request, and hence no way to reject only those
which are invalid.

Signed-off-by: James Jones 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_resource.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
index 18c4dfad23d..c9ee097d269 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
@@ -10,13 +10,11 @@ static struct pipe_resource *
 nvc0_resource_create(struct pipe_screen *screen,
  const struct pipe_resource *templ)
 {
-   const uint64_t modifier = DRM_FORMAT_MOD_INVALID;
-
switch (templ->target) {
case PIPE_BUFFER:
   return nouveau_buffer_create(screen, templ);
default:
-  return nvc0_miptree_create(screen, templ, &modifier, 1);
+  return nvc0_miptree_create(screen, templ, NULL, 0);
}
 }
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] nouveau: Use format modifiers in buffer allocation

2020-02-05 Thread James Jones
The nvc0 nouveau backend already claimed to
support format modifiers, but in practice it
ignored them when allocating buffers outside of a
perfunctory check for the linear modifier in the
first element of the format modifier list.

This change deduces the supported modifiers, if
any, for a given miptree creation request,
prioritizes them based on performance and memory
waste properties, compares the requested modifiers
against the prioritized list of supported
modifiers, and overrides the internal layout
calculations based on the layout defined by the
resulting modifier.

Additionally, if modifiers are provided and none
are compatible with the miptree creation request,
the function now fails.  This brings the nouveau
behavior in line with other drivers such as i965
and etnaviv.

Signed-off-by: James Jones 
---
 .../drivers/nouveau/nvc0/nvc0_miptree.c   | 111 --
 1 file changed, 103 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
index 20e4c4decb1..02c163e3e8a 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
@@ -132,7 +132,7 @@ nvc0_choose_tiled_storage_type(enum pipe_format format,
 }
 
 static uint32_t
-nvc0_mt_choose_storage_type(struct nv50_miptree *mt, bool compressed)
+nvc0_mt_choose_storage_type(const struct nv50_miptree *mt, bool compressed)
 {
const unsigned ms = util_logbase2(mt->base.base.nr_samples);
 
@@ -196,7 +196,7 @@ nvc0_miptree_init_layout_video(struct nv50_miptree *mt)
 }
 
 static void
-nvc0_miptree_init_layout_tiled(struct nv50_miptree *mt)
+nvc0_miptree_init_layout_tiled(struct nv50_miptree *mt, uint64_t modifier)
 {
struct pipe_resource *pt = &mt->base.base;
unsigned w, h, d, l;
@@ -213,6 +213,9 @@ nvc0_miptree_init_layout_tiled(struct nv50_miptree *mt)
d = mt->layout_3d ? pt->depth0 : 1;
 
assert(!mt->ms_mode || !pt->last_level);
+   assert(modifier == DRM_FORMAT_MOD_INVALID ||
+  (!pt->last_level && !mt->layout_3d));
+   assert(modifier != DRM_FORMAT_MOD_LINEAR);
 
for (l = 0; l <= pt->last_level; ++l) {
   struct nv50_miptree_level *lvl = &mt->level[l];
@@ -222,7 +225,10 @@ nvc0_miptree_init_layout_tiled(struct nv50_miptree *mt)
 
   lvl->offset = mt->total_size;
 
-  lvl->tile_mode = nvc0_tex_choose_tile_dims(nbx, nby, d, mt->layout_3d);
+  if (modifier != DRM_FORMAT_MOD_INVALID)
+ lvl->tile_mode = ((uint32_t)modifier & 0xf) << 4;
+  else
+ lvl->tile_mode = nvc0_tex_choose_tile_dims(nbx, nby, d, 
mt->layout_3d);
 
   tsx = NVC0_TILE_SIZE_X(lvl->tile_mode); /* x is tile row pitch in bytes 
*/
   tsy = NVC0_TILE_SIZE_Y(lvl->tile_mode);
@@ -289,6 +295,79 @@ nvc0_miptree_get_handle(struct pipe_screen *pscreen,
return true;
 }
 
+static uint64_t
+nvc0_miptree_select_best_modifier(struct pipe_screen *pscreen,
+  const struct nv50_miptree *mt,
+  const uint64_t *modifiers,
+  unsigned int count)
+{
+   uint64_t prio_supported_mods[] = {
+  DRM_FORMAT_MOD_INVALID,
+  DRM_FORMAT_MOD_INVALID,
+  DRM_FORMAT_MOD_INVALID,
+  DRM_FORMAT_MOD_INVALID,
+  DRM_FORMAT_MOD_INVALID,
+  DRM_FORMAT_MOD_INVALID,
+  DRM_FORMAT_MOD_LINEAR,
+   };
+   const uint32_t uc_kind = nvc0_mt_choose_storage_type(mt, false);
+   int top_mod_slot = ARRAY_SIZE(prio_supported_mods);
+   unsigned int i;
+   int p;
+
+   if (uc_kind != 0u) {
+  const struct pipe_resource *pt = &mt->base.base;
+  const unsigned nbx = util_format_get_nblocksx(pt->format, pt->width0);
+  const unsigned nby = util_format_get_nblocksy(pt->format, pt->height0);
+  const uint32_t lbh_preferred =
+ NVC0_TILE_MODE_Y(nvc0_tex_choose_tile_dims(nbx, nby, 1u, false));
+  uint32_t lbh = lbh_preferred;
+  bool dec_lbh = true;
+  const uint8_t s = nouveau_screen(pscreen)->tegra_sector_layout ? 0 : 1;
+
+  for (i = 0; i <= 5u; i++) {
+ assert(lbh <= 5u);
+ prio_supported_mods[i] =
+DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, s, 0, uc_kind, lbh);
+
+ /*
+  * The preferred block height is the largest block size that doesn't
+  * waste excessive space with unused padding bytes relative to the
+  * height of the image.  Construct the priority array such that
+  * the preferred block height is highest priority, followed by
+  * progressively smaller block sizes down to a block height of one,
+  * followed by progressively larger (more wasteful) block sizes up
+  * to 5.
+  */
+ if (lbh == 0u) {
+lbh = lbh_preferred + 1u;
+dec_lbh = false;
+ } else if (dec_lbh) {
+ 

[Mesa-dev] [PATCH 2/5] nouveau: Stash supported sector layout in screen

2020-02-05 Thread James Jones
Older Tegra GPUs use a different sector bit
swizzling layout than desktop and Xavier GPUs.
Hence their format modifiers must be
differentiated from those of other GPUs.  As
a precursor to supporting more expressive
block linear format modifiers, deduce the
sector layout used for a given GPU from its
chipset and stash the layout in the nouveau
screen structure.

Signed-off-by: James Jones 
---
 src/gallium/drivers/nouveau/nouveau_screen.c | 12 
 src/gallium/drivers/nouveau/nouveau_screen.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nouveau_screen.c 
b/src/gallium/drivers/nouveau/nouveau_screen.c
index de9cce3812a..f63af6313e4 100644
--- a/src/gallium/drivers/nouveau/nouveau_screen.c
+++ b/src/gallium/drivers/nouveau/nouveau_screen.c
@@ -213,6 +213,18 @@ nouveau_screen_init(struct nouveau_screen *screen, struct 
nouveau_device *dev)
   size = sizeof(nvc0_data);
}
 
+   switch (dev->chipset) {
+   case 0x0ea: /* TK1, GK20A */
+   case 0x12b: /* TX1, GM20B */
+   case 0x13b: /* TX2, GP10B */
+  screen->tegra_sector_layout = true;
+  break;
+   default:
+  /* Xavier's GPU and everything else */
+  screen->tegra_sector_layout = false;
+  break;
+   }
+
/*
 * Set default VRAM domain if not overridden
 */
diff --git a/src/gallium/drivers/nouveau/nouveau_screen.h 
b/src/gallium/drivers/nouveau/nouveau_screen.h
index 40464225c75..0abaf4db0f5 100644
--- a/src/gallium/drivers/nouveau/nouveau_screen.h
+++ b/src/gallium/drivers/nouveau/nouveau_screen.h
@@ -58,6 +58,7 @@ struct nouveau_screen {
int64_t cpu_gpu_time_delta;
 
bool hint_buf_keep_sysmem_copy;
+   bool tegra_sector_layout;
 
unsigned vram_domain;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] drm-uapi: Update headers from nouveau/linux-5.6

2020-02-05 Thread James Jones
Copy latest drm_fourcc.h from nouveau/linux-5.6

XXX - Update this with final commit ID/info

commit: d8a841ff4f4cbb31dd0dfd037399421969837730
Author: James Jones 
Date:   Tue Aug 6 17:10:10 2019 -0700

drm: Generalized NV Block Linear DRM format mod

Signed-off-by: James Jones 
---
 include/drm-uapi/drm_fourcc.h | 135 +++---
 1 file changed, 126 insertions(+), 9 deletions(-)

diff --git a/include/drm-uapi/drm_fourcc.h b/include/drm-uapi/drm_fourcc.h
index 2376d36ea57..56217e2f39e 100644
--- a/include/drm-uapi/drm_fourcc.h
+++ b/include/drm-uapi/drm_fourcc.h
@@ -69,7 +69,7 @@ extern "C" {
 #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \
 ((__u32)(c) << 16) | ((__u32)(d) << 24))
 
-#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of 
little endian */
+#define DRM_FORMAT_BIG_ENDIAN (1U<<31) /* format is big endian instead of 
little endian */
 
 /* Reserve 0 for the invalid format specifier */
 #define DRM_FORMAT_INVALID 0
@@ -410,6 +410,17 @@ extern "C" {
 #define I915_FORMAT_MOD_Y_TILED_CCSfourcc_mod_code(INTEL, 4)
 #define I915_FORMAT_MOD_Yf_TILED_CCS   fourcc_mod_code(INTEL, 5)
 
+/*
+ * Intel color control surfaces (CCS) for Gen-12 render compression.
+ *
+ * The main surface is Y-tiled and at plane index 0, the CCS is linear and
+ * at index 1. A 64B CCS cache line corresponds to an area of 4x1 tiles in
+ * main surface. In other words, 4 bits in CCS map to a main surface cache
+ * line pair. The main surface pitch is required to be a multiple of four
+ * Y-tile widths.
+ */
+#define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS fourcc_mod_code(INTEL, 6)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
@@ -497,7 +508,113 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
 
 /*
- * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
+ * Generalized Block Linear layout, used by desktop GPUs starting with 
NV50/G80,
+ * and Tegra GPUs starting with Tegra K1.
+ *
+ * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
+ * based on the architecture generation.  GOBs themselves are then arranged in
+ * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
+ * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
+ * a block depth or height of "4").
+ *
+ * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
+ * in full detail.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  3:0  h log2(height) of each block, in GOBs.  Placed here for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  4:4  - Must be 1, to indicate block-linear layout.  Necessary for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  8:5  - Reserved (To support 3D-surfaces with variable log2(depth) block
+ * size).  Must be zero.
+ *
+ * Note there is no log2(width) parameter.  Some portions of the
+ * hardware support a block width of two gobs, but it is 
impractical
+ * to use due to lack of support elsewhere, and has no known
+ * benefits.
+ *
+ * 11:9  - Reserved (To support 2D-array textures with variable array 
stride
+ * in blocks, specified via log2(tile width in blocks)).  Must be
+ * zero.
+ *
+ * 19:12 k Page Kind.  This value directly maps to a field in the page
+ * tables of all GPUs >= NV50.  It affects the exact layout of bits
+ * in memory and can be derived from the tuple
+ *
+ *   (format, GPU model, compression type, samples per pixel)
+ *
+ * Where compression type is defined below.  If GPU model were
+ * implied by the format modifier, format, or memory buffer, page
+ * kind would not need to be included in the modifier itself, but
+ * since the modifier should define the layout of the associated
+ * memory buffer independent from any device or other context, it
+ * must be included here.
+ *
+ * 21:20 g GOB Height and Page Kind Generation.  The height of a GOB 
changed
+ * starting with Fermi GPUs.  Additionally, the mapping between 
page
+ * kind and bit layout has changed at various points.
+ *
+ *   0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
+ *   1 = Gob Height 4, G80 - GT2XX Page Kind mapping
+ *   2 = Gob Height 8, Turing+ Page Kind mapping
+ *   3 = Reserved for future use.
+ *
+ * 22:22 s Sector layout

Re: [Mesa-dev] [PATCH 0/5] nouveau: Improved format modifier support

2020-02-05 Thread James Jones
Thanks, now available as 
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3724


-James

On 2/5/20 1:45 PM, Jason Ekstrand wrote:
FYI: GitLab merge requests are the preferred way to send patches these 
days.


--Jason

On February 5, 2020 21:52:25 James Jones  wrote:


This series pulls in the proposed
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() format
modifier macro and wires it up in the nouveau
nvc0 driver.  In doing so, it improves the
existing format modifier code to behave more
like other format-modifier-capable drivers, and
is written in such a way that it should be easier
to port to nv50-class and future turing-class
drivers as well.

Modifiers supporting import/export of compressed
surfaces are not included in this series.  Once
the general approach here is agreed upon, I can
send out a follow-on series adding those as well.

This series depends on the general gallium/dri
cleanup patch:

 [PATCH] gallium: Add format modifier aux plane query

Which was sent out separately.

James Jones (5):
 drm-uapi: Update headers from nouveau/linux-5.6
 nouveau: Stash supported sector layout in screen
 nouveau: Use DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D
 nouveau: no modifier != the invalid modifier
 nouveau: Use format modifiers in buffer allocation

include/drm-uapi/drm_fourcc.h | 135 +++-
src/gallium/drivers/nouveau/nouveau_screen.c  |  12 +
src/gallium/drivers/nouveau/nouveau_screen.h  |   1 +
.../drivers/nouveau/nvc0/nvc0_miptree.c   | 208 --
.../drivers/nouveau/nvc0/nvc0_resource.c  |  41 ++--
.../drivers/nouveau/nvc0/nvc0_resource.h  |   5 +
6 files changed, 306 insertions(+), 96 deletions(-)

--
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] libvdpau_r600.so not found

2011-12-16 Thread James Jones

On 12/16/11 4:27 PM, Younes Manton wrote:

On Fri, Dec 16, 2011 at 7:01 PM, James Cloos  wrote:

I've been trying to test out vdpau w/o success.

It turns out that libvdpau_r600.so is in /usr/lib64/vdpau/, whereas
everything looks for it in the standard ld path (ie, /usr/lib64).

With »ln -s vdpau/libvdpau_r600.so /usr/lib64/« it seems to work.

Is mesa wrong to install the libs to ${libdir}/vdpau, or is libvdpau
and the apps wrong to look for them in ${libdir}?

(Everything was compiled locally; most of the fdo stuff from git master.)

-JimC
--
James Cloos   OpenPGP: 1024D/ED7DAEA6


Mesa is probably wrong. You can specify it with ./configure
--with-vdpau-libdir= of course, but I can't think of a good reason to
have the default be not in the usually searched paths.


The correct dir is /vdpau.  From here:

  http://cgit.freedesktop.org/~aplattner/libvdpau/tree/configure.ac:72

Do you have an older libvdpau.so?

We had a long debate over this internally and decided it was better to 
have the backends in a non-searchpath directory.  I believe the idea was 
to discourage directly linking to them, and to roughly follow the lead 
of OpenCL in this area, but my memory is getting foggy.  Before we made 
the switch, old NVIDIA builds of libvdpau.so looked in  and 
installed libvdpau_nvidia.so there as well.


Thanks,
-James

nvpublic
_

mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] OpenGL compositors and glxWaitX()

2011-06-06 Thread James Jones
On 6/6/11 2:09 AM, "Thomas Hellstrom"  wrote:

> Hi!
> 
> While trying to improve the vmwgfx X driver to better cope with OpenGL
> compositors, I noticed that compiz never calls glxWaitX() to make sure
> the pixmaps that it textures from are updated.
> 
> Since we migrate X rendered data to the dri2 drawables only on demand,
> we miss a lot of data.
> 
> Googling for this it seems this has been up for discussion before, and
> it appears glxWaitX() is not used because either
> 
> 1) it's considered to be a performance hit.
> 2) it only affects the drawables currently bound by the glx context
> issuing the command.
> 
> While 1) may be true,

A properly implemented glXWaitX shouldn't cause much if any noticeable
performance hit.  However, I don't know how optimal existing implementations
are.

> reading the glx specification it appears to me
> that 2) does not hold, even if it appears like some dri2 server side
> implementations do not flush all drawables. Furthermore it appears to me
> to be a requirement for correctness to call glXWaitX before using the
> pixmap content.
> 
> Does anyone on the list have more info on this, and whether not calling
> glxWaitX() is the common usage pattern by other compositors?

2) is not quite true, but glXWaitX is still not sufficient.  glXWaitX only
waits for rendering done on the specified display to complete.  When using
composite managers, all the rendering you want to wait for in the composite
manager happens on other display connections (the client apps).  glXWaitX is
specified to, at best, be equivalent to XSync().

I've been told DRI2 based drivers implicitly synchronize texture from pixmap
operations with X rendering, but if you would like to do explicit
synchronization, please check out the X Fence Sync work I've done (X
Synchronization extension version 3.1, GL_EXT_x11_sync_object), which can be
used to solve the race condition you describe.

Thanks,
-James

> Any input appreciated.
> Thanks,
> Thomas
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

nvpublic

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] glXMakeCurrent crashes (was: Re: How to obtain OpenGL implementation/driver information?)

2011-02-04 Thread James Jones
On 2/4/11 3:26 PM, "Benoit Jacob"  wrote:

> 
> 
> 
> - Original Message -
>> On Fre, 2011-02-04 at 14:21 -0800, Benoit Jacob wrote:
>>> 
>>> - Original Message -
 Benoit Jacob wrote:
> - Original Message -
>> On Thu, Feb 3, 2011 at 4:37 PM, Benoit Jacob
>> 
>> wrote:
>>> I'm trying to see how to implement selective
>>> whitelisting/blacklisting of driver versions on X11 (my use
>>> case
>>> is
>>> to whitelist drivers for Firefox). The naive approach consists
>>> in
>>> creating an OpenGL context and calling glGetString(), however
>>> that
>>> is not optimal for me, for these reasons:
>>>  * This has been enough to trigger crashes in the past.
>>> 
>>> Ideally I want to be able to know the driver name, driver
>>> version,
>>> Mesa version, and any other thing that you think may be
>>> relevant.
>>> I
>>> need to get that information in a fast and safe way.
>>> 
>> There is no other way than glGetString if you ever experienced
>> crash
>> with it, it would be because you are doing something terribly
>> wrong
>> like using it without current context.
> 
> It's not glGetString that's crashing, it's glXMakeCurrent.
> 
> I forwarded a bug report from a user, though he's not been able
> to
> reproduce since:
> 
>   https://bugs.freedesktop.org/show_bug.cgi?id=32238
> 
> A search in Mesa's bugzilla confirms that I'm not alone:
> 
>   https://bugs.freedesktop.org/show_bug.cgi?id=30557
 
 This latter bug looks like an i915 driver bug, as opposed to a
 MakeCurrent bug.
 
> Since the glGetString way will at best be slow, especially if we
> have
> to XSync and check for errors, could you consider exposing this
> information as new glXGetServerString / glXGetClientString
> strings?
 
 ? I don't understand the logic here.
 
 You're hitting a bug in glXCreateContext or MakeCurrent or
 something
 like that. So you'd like to add an entire new way to query the
 same
 information a driver already provides, just to provide an
 alternate
 path
 that hopefully doesn't exhibit the bug?
 
 Just fix the bug! There's no reason for glX extensions to add new
 functions here.
>>> 
>>> My point is just that bugs exist.
>>> 
>>> Since bugs exist, I am trying to implement a driver blacklist.
>>> 
>>> My problem is that with GLX it's tricky because I can't get answer
>>> to
>>> the question "should I avoid creating GL contexts on this driver"
>>> without creating a GL context.
>>> 
>>> I proposed to allow handling this in glXQuery(Server|Client)String
>>> because these functions are known to be non-crashy.
>> 
>> What you're asking for is not possible, because the information you
>> need
>> depends on the context which is current. No shortcuts here I'm
>> afraid. :)
> 
> We're doing driver blacklists on all platforms, and it tends to be quite easy
> on other platforms. For example, on Windows, we just read all the driver
> information from the registry. Couldn't X drivers likewise have some metadata
> stored on disk, that could be queried via some new API? I proposed GLX because
> glXQueryServerString already knows about at least the driver vendor. But I
> don't mind having it exposed elsewhere than in GLX if that makes more sense :)
> 
> Please take this request seriously: driver blacklists are useful, not limited
> to Firefox, and certainly not limited to X11. As I say, we blacklist drivers
> on all platforms, and we'd never have been able to make Firefox 4 releasable
> without blacklisting many Windows drivers. Zhenyao Mo, in CC, is working on
> similar features in Chromium.
> 
> Cheers,
> Benoit

Chiming in because it's in my interest to not get dragged into supporting
unnecessary extensions.

Suppose an extension were added that allowed
glXQueryServerString()/glXQueryClientString() to return the information you
request some how.  You'll still need to check for the extension string for
that new GLX extension before requesting those new strings.  And then what
will you do if it isn't present?  If you try to press on and create a
context anyway, you'll still get crashes with whatever buggy driver release
the user reported this crash with, because it won't have this new extension.
If you treat the absence of the extension as a failure and don't create a
context, you've excluded all currently shipping OpenGL implementations from
your list of supported configs.

The only nearly robust way to solve this issue is Brian Paul's suggestion.
If spawning a separate process to run the test is too expensive, install a
signal/exception handler around your GLX initialization and tests.  That way
you can detect crashes AND actually use the resulting context if it passes
your tests.

Thanks,
-James

>> 
>> 
>> --
>> Earthling Michel Dänzer | http://www.vmware.com
>> Libre software

Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications

2024-10-21 Thread James Jones

On 10/21/24 07:33, Jose Fonseca wrote:

I see a few downsides with the proposed callback:
- feels like a solution too tailored for WINE
- there's a layering violation: the application suddenly takes the 
driving seat for a thing deep down in the GL driver
so I fear Mesa community might regret it doing, and once WINE supports 
there would be outcry if to go back.



IIUC the problem at hand, another way to go about this would be an 
extension that allows applications to get a malloc'ed/valloc'ed memory 
exposed to the GPU as a GL buffer object.


I feel this would be potentially useful to applications other than just 
WINE, especially on systems with unified memory.  And there have been 
extensions along these lines before, for example, 
https://registry.khronos.org/OpenGL/extensions/AMD/AMD_pinned_memory.txt 



These ideas by themselves make it very difficult to get e.g., a mapping 
of a device memory region as opposed to a system memory buffer, and...


There's also NVIDIA's Heterogeneous Memory Management, which takes this 
idea to a whole different level:
- 
https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/ 
- https://www.kernel.org/doc/html/v5.0/vm/hmm.html 

- 
https://lpc.events/event/2/contributions/70/attachments/14/6/hmm-lpc18.pdf 

- https://lwn.net/Articles/752964/ 
- https://lwn.net/Articles/684916/ 


These are great, but seem like overkill for this problem space.

The Vulkan solution was very minimally invasive from a driver point of 
view. If only WINE ends up using it, it's not that big of a deal. WINE 
is a common use case, and there are plenty of things in graphics APIs 
that cater to one or a small set of very impactful use cases. If the 
OpenGL extension had a similarly small footprint, it also wouldn't be 
that big of a deal if it were tailored to WINE. Note Vulkan already has 
the equivalent of the above AMD extension, and chose to add additional 
functionality for this particular use case anyway.


All that said, I don't love the idea of callbacks either. Callbacks in 
general are tough to specify and use robustly, and hence should be a 
last resort. E.g., this particular callback might sometimes come from 
the application thread, and sometimes come from some separate 
driver-managed thread. It's hard to validate that all applications can 
handle that properly and wouldn't do something crazy like rely on their 
own TLS data in the callback or try to call back into OpenGL from the 
callback and deadlock themselves, even if these are clearly specified as 
an unsupported actions.


Thanks,
-James

But I remember that Thomas Hellstrom (while at VMware, now Intel) once 
prototyped this without HMM, just plain DRM.  I think HMM provides the 
ability to do this transparently for application, which is above and 
beyond what's strictly needed for WINE.


Metal API also provides this -- 
https://developer.apple.com/documentation/metal/mtldevice/1433382-newbufferwithbytesnocopy?language=objc 


Jose

On Fri, Oct 18, 2024 at 11:10 PM Derek Lesho > wrote:


Hey everyone 👋,

I'm Derek from the Wine project, and wanted to start a discussion with
y'all about potentially extending the Mesa OGL drivers to help us
with a
functionality gap we're facing.

Problem Space:

In the last few years Wine's support for running 32-bit windows apps in
a 64-bit host environment (wow64) has almost reached feature
completion,
but there remains a pain point with OpenGL applications: Namely that
Wine can't return a 64-bit GL implementation's buffer mappings to a 32
bit application when the address is outside of the 32-bit range.

Currently, we have a workaround that will copy any changes to the
mapping back to the host upon glBufferUnmap, but this of course is slow
when the implementation directly returns mapped memory, and doesn't
work
for GL_PERSISTENT_BIT, where directly mapped memory is required.

A few years ago we also faced this problem with Vulkan's, which was
solved through the VK_EXT_map_memory_placed extension Faith drafted,
allowing us to use our Wine-internal allocator to provide the pages the
driver maps to. I'm now wondering if an GL equivalent would also be
seen
as feasible amongst the devs here.

Proposed solution:

As the GL backend handles host mapping in its own code, only giving
suballocations from its mappings back to the Ap

Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications

2024-10-22 Thread James Jones
This sounds interesting, but does it come with the same "Only gets 2GB 
VA" downside Derek pointed out in the thread fork where he was 
responding to Michel?


Thanks,
-James

On 10/22/24 07:14, Christian König wrote:

Hi guys,

one theoretical alternative not mentioned in this thread is the use of 
mremap().


In other words you reserve some address space below 2G by using 
mmap(NULL, length, PROT_NONE, MAP_32BIT | MAP_ANONYMOUS, 0, 0) and then 
use mremap(addr64bit, 0, length, MREMAP_FIXED, reserved_addr).


I haven't tested this but at least in theory it should give you a 
duplicate of the 64bit mapping in the lower 2G of the address space.


Important is that you give 0 as oldsize to mremap() so that the old 
mapping isn't unmapped but rather just a new mapping of the existing VMA 
created.


Regards,
Christian.


Am 18.10.24 um 23:55 schrieb Derek Lesho:

Hey everyone 👋,

I'm Derek from the Wine project, and wanted to start a discussion with 
y'all about potentially extending the Mesa OGL drivers to help us with 
a functionality gap we're facing.


Problem Space:

In the last few years Wine's support for running 32-bit windows apps 
in a 64-bit host environment (wow64) has almost reached feature 
completion, but there remains a pain point with OpenGL applications: 
Namely that Wine can't return a 64-bit GL implementation's buffer 
mappings to a 32 bit application when the address is outside of the 
32-bit range.


Currently, we have a workaround that will copy any changes to the 
mapping back to the host upon glBufferUnmap, but this of course is 
slow when the implementation directly returns mapped memory, and 
doesn't work for GL_PERSISTENT_BIT, where directly mapped memory is 
required.


A few years ago we also faced this problem with Vulkan's, which was 
solved through the VK_EXT_map_memory_placed extension Faith drafted, 
allowing us to use our Wine-internal allocator to provide the pages 
the driver maps to. I'm now wondering if an GL equivalent would also 
be seen as feasible amongst the devs here.


Proposed solution:

As the GL backend handles host mapping in its own code, only giving 
suballocations from its mappings back to the App, the problem is a 
little bit less straight forward in comparison to our Vulkan solution: 
If we just allowed the application to set its own placed mapping when 
calling glMapBuffer, the driver might then have to handle moving 
buffers out of already mapped ranges, and would lose control over its 
own memory management schemes.


Therefore, I propose a GL extension that allows the GL client to 
provide a mapping and unmapping callback to the implementation, to be 
used whenever the driver needs to perform such operations. This way 
the driver remains in full control of its memory management affairs, 
and the amount of work for an implementation as well as potential for 
bugs is kept minimal. I've written a draft implementation in Zink 
using map_memory_placed [1] and a corresponding Wine MR utilizing it 
[2], and would be curious to hear your thoughts. I don't have 
experience in the Mesa codebase, so I apologize if the branch is a tad 
messy.


In theory, the only requirement from drivers from the extension would 
be that glMapBuffer always return a pointer from within a page 
allocated through the provided callbacks, so that it can be guaranteed 
to be positioned within the required address space. Wine would then 
use it's existing workaround for other types of buffers, but as Mesa 
seems to often return directly mapped buffers in other cases as well, 
Wine could also avoid the slowdown that comes with copying in these 
cases as well.


Why not use Zink?:

There's also a proposal to use a 32-bit PE build of Zink in Wine 
bypassing the need for an extension; I brought this to discussion in 
this Wine-Devel thread last week [3], which has some arguments against 
this approach.



If any of you have thoughts, concerns, or questions about this 
potential approach, please let me know, thanks!


1: 
https://gitlab.freedesktop.org/Guy1524/mesa/-/commits/placed_allocation


2: https://gitlab.winehq.org/wine/wine/-/merge_requests/6663

3: https://marc.info/?t=17288326032&r=1&w=2





Re: Helping Wine use 64 bit Mesa OGL drivers for 32-bit Windows applications

2024-10-23 Thread James Jones

That makes sense. Reading the man page myself, it does seem like:

-If the drivers can guarantee they set MAP_SHARED when creating their 
initial mapping.


-If WINE is fine rounding down to page boundaries to deal with mappings 
of suballocations and either using some lookup structure to avoid 
duplicate remappings (probably needed to handle unmap anyway per below) 
or just living with the perf cost and address space overconsumption for 
duplicate remappings.


-If mremap() preserves the cache attributes of the original mapping.

Then no GL API change would be needed. WINE would just have to do an if 
(addrAbove4G) { mremapStuff() } on map and presumably add some tracking 
to perform an equivalent munmap() when unmapping. I assume WINE already 
has a bunch of vaddr tracking logic in use to manage the <4G address 
space as described elsewhere in the thread. That would be pretty ideal 
from a driver vendor perspective.


Does that work?

Thanks,
-James

On 10/23/24 06:12, Christian König wrote:
I haven't read through the whole mail thread, but if you manage the 
address space using mmap() then you always run into this issue.


If you manage the whole 4GiB address space by Wine then you never run 
into this issue. You would just allocate some address range internally 
and mremap() into that.


Regards,
Christian.

Am 22.10.24 um 19:32 schrieb James Jones:
This sounds interesting, but does it come with the same "Only gets 2GB 
VA" downside Derek pointed out in the thread fork where he was 
responding to Michel?


Thanks,
-James

On 10/22/24 07:14, Christian König wrote:

Hi guys,

one theoretical alternative not mentioned in this thread is the use 
of mremap().


In other words you reserve some address space below 2G by using 
mmap(NULL, length, PROT_NONE, MAP_32BIT | MAP_ANONYMOUS, 0, 0) and 
then use mremap(addr64bit, 0, length, MREMAP_FIXED, reserved_addr).


I haven't tested this but at least in theory it should give you a 
duplicate of the 64bit mapping in the lower 2G of the address space.


Important is that you give 0 as oldsize to mremap() so that the old 
mapping isn't unmapped but rather just a new mapping of the existing 
VMA created.


Regards,
Christian.


Am 18.10.24 um 23:55 schrieb Derek Lesho:

Hey everyone 👋,

I'm Derek from the Wine project, and wanted to start a discussion 
with y'all about potentially extending the Mesa OGL drivers to help 
us with a functionality gap we're facing.


Problem Space:

In the last few years Wine's support for running 32-bit windows apps 
in a 64-bit host environment (wow64) has almost reached feature 
completion, but there remains a pain point with OpenGL applications: 
Namely that Wine can't return a 64-bit GL implementation's buffer 
mappings to a 32 bit application when the address is outside of the 
32-bit range.


Currently, we have a workaround that will copy any changes to the 
mapping back to the host upon glBufferUnmap, but this of course is 
slow when the implementation directly returns mapped memory, and 
doesn't work for GL_PERSISTENT_BIT, where directly mapped memory is 
required.


A few years ago we also faced this problem with Vulkan's, which was 
solved through the VK_EXT_map_memory_placed extension Faith drafted, 
allowing us to use our Wine-internal allocator to provide the pages 
the driver maps to. I'm now wondering if an GL equivalent would also 
be seen as feasible amongst the devs here.


Proposed solution:

As the GL backend handles host mapping in its own code, only giving 
suballocations from its mappings back to the App, the problem is a 
little bit less straight forward in comparison to our Vulkan 
solution: If we just allowed the application to set its own placed 
mapping when calling glMapBuffer, the driver might then have to 
handle moving buffers out of already mapped ranges, and would lose 
control over its own memory management schemes.


Therefore, I propose a GL extension that allows the GL client to 
provide a mapping and unmapping callback to the implementation, to 
be used whenever the driver needs to perform such operations. This 
way the driver remains in full control of its memory management 
affairs, and the amount of work for an implementation as well as 
potential for bugs is kept minimal. I've written a draft 
implementation in Zink using map_memory_placed [1] and a 
corresponding Wine MR utilizing it [2], and would be curious to hear 
your thoughts. I don't have experience in the Mesa codebase, so I 
apologize if the branch is a tad messy.


In theory, the only requirement from drivers from the extension 
would be that glMapBuffer always return a pointer from within a page 
allocated through the provided callbacks, so that it can be 
guaranteed to be positioned within the required address space. Wine 
would then use it's existing workaround for other types of buffers, 
but as Mesa seems to often return directly ma

Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

2025-01-21 Thread James Jones

On 1/20/25 14:00, Laurent Pinchart wrote:

On Fri, Jan 10, 2025 at 01:23:40PM -0800, James Jones wrote:

On 12/19/24 10:03, Simona Vetter wrote:

On Thu, Dec 19, 2024 at 09:02:27AM +, Daniel Stone wrote:

On Wed, 18 Dec 2024 at 10:32, Brian Starkey  wrote:

On Wed, Dec 18, 2024 at 11:24:58AM +, Simona Vetter wrote:

For that reason I think linear modifiers with explicit pitch/size
alignment constraints is a sound concept and fits into how modifiers work
overall.


Could we make it (more) clear that pitch alignment is a "special"
constraint (in that it's really a description of the buffer layout),
and that constraints in-general shouldn't be exposed via modifiers?


It's still worryingly common to see requirements for contiguous
allocation, if for no other reason than we'll all be stuck with
Freescale/NXP i.MX6 for a long time to come. Would that be in scope
for expressing constraints via modifiers as well, and if so, should we
be trying to use feature bits to express this?

How this would be used in practice is also way too underdocumented. We
need to document that exact-round-up 64b is more restrictive than
any-multiple-of 64b is more restrictive than 'classic' linear. We need
to document what people should advertise - if we were starting from
scratch, the clear answer would be that anything which doesn't care
should advertise all three, anything advertising any-multiple-of
should also advertise exact-round-up, etc.

But we're not starting from scratch, and since linear is 'special',
userspace already has explicit knowledge of it. So AMD is going to
have to advertise LINEAR forever, because media frameworks know about
DRM_FORMAT_MOD_LINEAR and pass that around explicitly when they know
that the buffer is linear. That and not breaking older userspace
running in containers or as part of a bisect or whatever.

There's also the question of what e.g. gbm_bo_get_modifier() should
return. Again, if we were starting from scratch, most restrictive
would make sense. But we're not, so I think it has to return LINEAR
for maximum compatibility (because modifiers can't be morphed into
other ones for fun), which further cements that we're not removing
LINEAR.

And how should allocators determine what to go for? Given that, I
think the only sensible semantics are, when only LINEAR has been
passed, to pick the most restrictive set possible; when LINEAR
variants have been passed as well as LINEAR, to act as if LINEAR were
not passed at all.


Yeah I think this makes sense, and we'd need to add that to the kerneldoc
about how drivers/apps/frameworks need to work with variants of LINEAR.

Just deprecating LINEAR does indeed not work. The same way it was really
hard slow crawl (and we're still not there everywhere, if you include
stuff like bare metal Xorg) trying to retire the implied modifier. Maybe,
in an extremely bright future were all relevant drivers advertise a full
set of LINEAR variants, and all frameworks understand them, we'll get
there. But if AMD is the one special case that really needs this I don't
think it's realistic to plan for that, and what Daniel describe above
looks like the future we're stuck to.
-Sima


I spent some time thinking about this over the break, because on a venn
diagram it does overlap a sliver of the work we've done to define the
differences between the concepts of constraints Vs. capabilities in the
smorgasbord of unified memory allocator talks/workshops/prototypes/etc.
over the years. I'm not that worried about some overlap being
introduced, because every reasonable rule deserves an exception here and
there, but I have concerns similar to Daniel's and Brian's.

Once you start adding more than one special modifier, some things in the
existing usage start to break down. Right now you can naively pass
around modifiers, then somewhere either before or just after allocation
depending on your usage, check if LINEAR is available and take your
special "I can parse this thing" path, for whatever that means in your
special use case. Modifying all those paths to include one variant of
linear is probably OK-but-not-great. Modifying all those paths to
include  variants of linear is probably unrealistic, and I do worry
about slippery slopes here.

---

What got me more interested though was this led to another thought. At
first I didn't notice that this was an exact-match constraint and
thought it meant the usual alignment constraint of >=, and I was
concerned about how future variants would interact poorly. It could
still be a concern if things progress down this route, and we have
vendor A requiring >= 32B alignment and vendor B requiring == 64B
alignment. They're compatible, but modifiers expressing this would
naively cancel each-other out unless vendor A proactively advertised ==
64B linear modifiers too. This isn't a huge deal 

Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

2025-01-14 Thread James Jones
I don't see how that fits in the current modifier usage patterns. I'm 
not clear how applications are supposed to programmatically "look at the 
modifiers of other drivers to find commonalities," nor how that "keeps 
"expectations the same as today for simplicity.". I think replacing the 
existing linear modifier would be very disruptive, and I don't think 
this proposal solves a general problem. Is it common for other vendors' 
hardware to have such strict pitch/height alignment requirements? Prior 
to this discussion, I'd only ever heard of minimum alignments.


Thanks,
-James

On 1/14/25 01:38, Marek Olšák wrote:
I would keep the existing modifier interfaces, API extensions, and 
expectations the same as today for simplicity.


The new linear modifier definition (proposal) will have these fields:
    5 bits for log2 pitch alignment in bytes
    5 bits for log2 height alignment in rows
    5 bits for log2 offset alignment in bytes
    5 bits for log2 minimum pitch in bytes
    5 bits for log2 minimum (2D) image size in bytes

The pitch and the image size in bytes are no longer arbitrary values. 
They are fixed values computed from {width, height, bpp, modifier} as 
follows:

    aligned_width = align(width * bpp / 8, 1 << log2_pitch_alignment);
    aligned_height = align(height, 1 << log2_height_alignment);
    pitch = max(1 << log2_minimum_pitch, aligned_width);
    image_size = max(1 << log2_minimum_image_size, pitch * aligned_height);

The modifier defines the layout exactly and non-ambiguously. 
Overaligning the pitch or height is not supported. Only the offset 
alignment has some freedom regarding placement. Drivers can expose 
whatever they want within that definition, even exposing only 1 linear 
modifier is OK. Then, you can look at modifiers of other drivers if you 
want to find commonalities.


DRM_FORMAT_MOD_LINEAR needs to go because it prevents apps from 
detecting whether 2 devices have 0 compatible memory layouts, which is a 
useful thing to know.


Marek

On Fri, Jan 10, 2025 at 4:23 PM James Jones <mailto:jajo...@nvidia.com>> wrote:


On 12/19/24 10:03, Simona Vetter wrote:
 > On Thu, Dec 19, 2024 at 09:02:27AM +, Daniel Stone wrote:
 >> On Wed, 18 Dec 2024 at 10:32, Brian Starkey
mailto:brian.star...@arm.com>> wrote:
 >>> On Wed, Dec 18, 2024 at 11:24:58AM +, Simona Vetter wrote:
 >>>> For that reason I think linear modifiers with explicit pitch/size
 >>>> alignment constraints is a sound concept and fits into how
modifiers work
 >>>> overall.
 >>>
 >>> Could we make it (more) clear that pitch alignment is a "special"
 >>> constraint (in that it's really a description of the buffer
layout),
 >>> and that constraints in-general shouldn't be exposed via modifiers?
 >>
 >> It's still worryingly common to see requirements for contiguous
 >> allocation, if for no other reason than we'll all be stuck with
 >> Freescale/NXP i.MX6 for a long time to come. Would that be in scope
 >> for expressing constraints via modifiers as well, and if so,
should we
 >> be trying to use feature bits to express this?
 >>
 >> How this would be used in practice is also way too
underdocumented. We
 >> need to document that exact-round-up 64b is more restrictive than
 >> any-multiple-of 64b is more restrictive than 'classic' linear.
We need
 >> to document what people should advertise - if we were starting from
 >> scratch, the clear answer would be that anything which doesn't care
 >> should advertise all three, anything advertising any-multiple-of
 >> should also advertise exact-round-up, etc.
 >>
 >> But we're not starting from scratch, and since linear is 'special',
 >> userspace already has explicit knowledge of it. So AMD is going to
 >> have to advertise LINEAR forever, because media frameworks know
about
 >> DRM_FORMAT_MOD_LINEAR and pass that around explicitly when they know
 >> that the buffer is linear. That and not breaking older userspace
 >> running in containers or as part of a bisect or whatever.
 >>
 >> There's also the question of what e.g. gbm_bo_get_modifier() should
 >> return. Again, if we were starting from scratch, most restrictive
 >> would make sense. But we're not, so I think it has to return LINEAR
 >> for maximum compatibility (because modifiers can't be morphed into
 >> other ones for fun), which further cements that we're not removing
 >> LINEAR.
 >>
 >> A

Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

2025-01-10 Thread James Jones

On 12/19/24 10:03, Simona Vetter wrote:

On Thu, Dec 19, 2024 at 09:02:27AM +, Daniel Stone wrote:

On Wed, 18 Dec 2024 at 10:32, Brian Starkey  wrote:

On Wed, Dec 18, 2024 at 11:24:58AM +, Simona Vetter wrote:

For that reason I think linear modifiers with explicit pitch/size
alignment constraints is a sound concept and fits into how modifiers work
overall.


Could we make it (more) clear that pitch alignment is a "special"
constraint (in that it's really a description of the buffer layout),
and that constraints in-general shouldn't be exposed via modifiers?


It's still worryingly common to see requirements for contiguous
allocation, if for no other reason than we'll all be stuck with
Freescale/NXP i.MX6 for a long time to come. Would that be in scope
for expressing constraints via modifiers as well, and if so, should we
be trying to use feature bits to express this?

How this would be used in practice is also way too underdocumented. We
need to document that exact-round-up 64b is more restrictive than
any-multiple-of 64b is more restrictive than 'classic' linear. We need
to document what people should advertise - if we were starting from
scratch, the clear answer would be that anything which doesn't care
should advertise all three, anything advertising any-multiple-of
should also advertise exact-round-up, etc.

But we're not starting from scratch, and since linear is 'special',
userspace already has explicit knowledge of it. So AMD is going to
have to advertise LINEAR forever, because media frameworks know about
DRM_FORMAT_MOD_LINEAR and pass that around explicitly when they know
that the buffer is linear. That and not breaking older userspace
running in containers or as part of a bisect or whatever.

There's also the question of what e.g. gbm_bo_get_modifier() should
return. Again, if we were starting from scratch, most restrictive
would make sense. But we're not, so I think it has to return LINEAR
for maximum compatibility (because modifiers can't be morphed into
other ones for fun), which further cements that we're not removing
LINEAR.

And how should allocators determine what to go for? Given that, I
think the only sensible semantics are, when only LINEAR has been
passed, to pick the most restrictive set possible; when LINEAR
variants have been passed as well as LINEAR, to act as if LINEAR were
not passed at all.


Yeah I think this makes sense, and we'd need to add that to the kerneldoc
about how drivers/apps/frameworks need to work with variants of LINEAR.

Just deprecating LINEAR does indeed not work. The same way it was really
hard slow crawl (and we're still not there everywhere, if you include
stuff like bare metal Xorg) trying to retire the implied modifier. Maybe,
in an extremely bright future were all relevant drivers advertise a full
set of LINEAR variants, and all frameworks understand them, we'll get
there. But if AMD is the one special case that really needs this I don't
think it's realistic to plan for that, and what Daniel describe above
looks like the future we're stuck to.
-Sima


I spent some time thinking about this over the break, because on a venn 
diagram it does overlap a sliver of the work we've done to define the 
differences between the concepts of constraints Vs. capabilities in the 
smorgasbord of unified memory allocator talks/workshops/prototypes/etc. 
over the years. I'm not that worried about some overlap being 
introduced, because every reasonable rule deserves an exception here and 
there, but I have concerns similar to Daniel's and Brian's.


Once you start adding more than one special modifier, some things in the 
existing usage start to break down. Right now you can naively pass 
around modifiers, then somewhere either before or just after allocation 
depending on your usage, check if LINEAR is available and take your 
special "I can parse this thing" path, for whatever that means in your 
special use case. Modifying all those paths to include one variant of 
linear is probably OK-but-not-great. Modifying all those paths to 
include  variants of linear is probably unrealistic, and I do worry 
about slippery slopes here.


---

What got me more interested though was this led to another thought. At 
first I didn't notice that this was an exact-match constraint and 
thought it meant the usual alignment constraint of >=, and I was 
concerned about how future variants would interact poorly. It could 
still be a concern if things progress down this route, and we have 
vendor A requiring >= 32B alignment and vendor B requiring == 64B 
alignment. They're compatible, but modifiers expressing this would 
naively cancel each-other out unless vendor A proactively advertised == 
64B linear modifiers too. This isn't a huge deal at that scale, but it 
could get worse, and it got me thinking about a way to solve the problem 
of a less naive way to merge modifier lists.


As a background, the two hard problems left with implementing a 
constraint system to sit alongside