Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
> So far, we've been trying to build those components in terms of the Vulkan API itself with calls jumping back into the dispatch table to try and get inside the driver. This is working but it's getting more and more fragile the more tools we add to that box. A lot of what I want to do with gallium2 or whatever we're calling it is to fix our layering problems so that calls go in one direction and we can untangle the jumble. I'm still not sure what I want that to look like but I think I want it to look a lot like Vulkan, just with a handier interface. That resonates with my experience. For example, Galllium draw module does some of this too -- it provides its own internal interfaces for drivers, but it also loops back into Gallium top interface to set FS and rasterizer state -- and that has *always* been a source of grief. Having control flow proceeding through layers in one direction only seems an important principle to observe. It's fine if the lower interface is the same interface (e.g., Gallium to Gallium, or Vulkan to Vulkan as you allude), but they shouldn't be the same exact entry-points/modules (ie, no reentrancy/recursion.) It's also worth considering that Vulkan extensibility could come in hand too in what you want to achieve. For example, Mesa Vulkan drivers could have their own VK_MESA_internal_ extensions that could be used by the shared Vulkan code to do lower level things. Jose On Wed, Jan 24, 2024 at 3:26 PM Faith Ekstrand wrote: > Jose, > > Thanks for your thoughts! > > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca > wrote: > > > > I don't know much about the current Vulkan driver internals to have or > provide an informed opinion on the path forward, but I'd like to share my > backwards looking perspective. > > > > Looking back, Gallium was two things effectively: > > (1) an abstraction layer, that's watertight (as in upper layers > shouldn't reach through to lower layers) > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) > > > > (1) was of course important -- and the discipline it imposed is what > enabled to great simplifications -- but it also became a straight-jacket, > as GPUs didn't stand still, and sooner or later the > see-every-hardware-as-the-same lenses stop reflecting reality. > > > > If I had to pick one, I'd say that (2) is far more useful and > practical.Take components like gallium's draw and other util modules. A > driver can choose to use them or not. One could fork them within Mesa > source tree, and only the drivers that opt-in into the fork would need to > be tested/adapted/etc > > > > On the flip side, Vulkan API is already a pretty low level HW > abstraction. It's also very flexible and extensible, so it's hard to > provide a watertight abstraction underneath it without either taking the > lowest common denominator, or having lots of optional bits of functionality > governed by a myriad of caps like you alluded to. > > There is a third thing that isn't really recognized in your description: > > (3) A common "language" to talk about GPUs and data structures that > represent that language > > This is precisely what the Vulkan runtime today doesn't have. Classic > meta sucked because we were trying to implement GL in GL. u_blitter, > on the other hand, is pretty fantastic because Gallium provides a much > more sane interface to write those common components in terms of. > > So far, we've been trying to build those components in terms of the > Vulkan API itself with calls jumping back into the dispatch table to > try and get inside the driver. This is working but it's getting more > and more fragile the more tools we add to that box. A lot of what I > want to do with gallium2 or whatever we're calling it is to fix our > layering problems so that calls go in one direction and we can > untangle the jumble. I'm still not sure what I want that to look like > but I think I want it to look a lot like Vulkan, just with a handier > interface. > > ~Faith > > > Not sure how useful this is in practice to you, but the lesson from my > POV is that opt-in reusable and shared libraries are always time well spent > as they can bend and adapt with the times, whereas no opt-out watertight > abstractions inherently have a shelf life. > > > > Jose > > > > On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand > wrote: > >> > >> Yeah, this one's gonna hit Phoronix... > >> > >> When we started writing Vulkan drivers back in the day, there was this > >> notion that Vulkan was a low-level API that directly targets hardware. > >> Vulkan drivers were these super thin things that just blasted packets > >> straight into the hardware. What little code was common was small and > >> pretty easy to just copy+paste around. It was a nice thought... > >> > >> What's happened in the intervening 8 years is that Vulkan has grown. A > lot. > >> > >> We already have several places where we're doing significant layering. > >> It started with sharing the WSI code and some Python
Tesla gaming (and more)
What is the most proper way to re-route output from a rendering card (which can have it's output disconnected or don't have it at all) to a displaying card (weak one, iGPU etc)? For example, a laptop with an external card in an ExpressCard riser (no external display is connected to the card), or a desktop PC with embedded video plus Nvidia Tesla? How should I configure Mesa in order to get «auto-screen-grabbing» from the rendering card to the displaying one?
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Thu, Jan 25, 2024 at 8:57 AM Jose Fonseca wrote: > > So far, we've been trying to build those components in terms of the > Vulkan API itself with calls jumping back into the dispatch table to try > and get inside the driver. This is working but it's getting more and more > fragile the more tools we add to that box. A lot of what I want to do with > gallium2 or whatever we're calling it is to fix our layering problems so > that calls go in one direction and we can untangle the jumble. I'm still > not sure what I want that to look like but I think I want it to look a lot > like Vulkan, just with a handier interface. > > That resonates with my experience. For example, Galllium draw module does > some of this too -- it provides its own internal interfaces for drivers, > but it also loops back into Gallium top interface to set FS and rasterizer > state -- and that has *always* been a source of grief. Having control > flow proceeding through layers in one direction only seems an important > principle to observe. It's fine if the lower interface is the same > interface (e.g., Gallium to Gallium, or Vulkan to Vulkan as you allude), > but they shouldn't be the same exact entry-points/modules (ie, no > reentrancy/recursion.) > > It's also worth considering that Vulkan extensibility could come in hand > too in what you want to achieve. For example, Mesa Vulkan drivers could > have their own VK_MESA_internal_ extensions that could be used by the > shared Vulkan code to do lower level things. > We already do that for a handful of things. The fact that Vulkan doesn't ever check the stuff in the pNext chain is really useful for that. 😅 ~Faith > Jose > > > On Wed, Jan 24, 2024 at 3:26 PM Faith Ekstrand > wrote: > >> Jose, >> >> Thanks for your thoughts! >> >> On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca >> wrote: >> > >> > I don't know much about the current Vulkan driver internals to have or >> provide an informed opinion on the path forward, but I'd like to share my >> backwards looking perspective. >> > >> > Looking back, Gallium was two things effectively: >> > (1) an abstraction layer, that's watertight (as in upper layers >> shouldn't reach through to lower layers) >> > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) >> > >> > (1) was of course important -- and the discipline it imposed is what >> enabled to great simplifications -- but it also became a straight-jacket, >> as GPUs didn't stand still, and sooner or later the >> see-every-hardware-as-the-same lenses stop reflecting reality. >> > >> > If I had to pick one, I'd say that (2) is far more useful and >> practical.Take components like gallium's draw and other util modules. A >> driver can choose to use them or not. One could fork them within Mesa >> source tree, and only the drivers that opt-in into the fork would need to >> be tested/adapted/etc >> > >> > On the flip side, Vulkan API is already a pretty low level HW >> abstraction. It's also very flexible and extensible, so it's hard to >> provide a watertight abstraction underneath it without either taking the >> lowest common denominator, or having lots of optional bits of functionality >> governed by a myriad of caps like you alluded to. >> >> There is a third thing that isn't really recognized in your description: >> >> (3) A common "language" to talk about GPUs and data structures that >> represent that language >> >> This is precisely what the Vulkan runtime today doesn't have. Classic >> meta sucked because we were trying to implement GL in GL. u_blitter, >> on the other hand, is pretty fantastic because Gallium provides a much >> more sane interface to write those common components in terms of. >> >> So far, we've been trying to build those components in terms of the >> Vulkan API itself with calls jumping back into the dispatch table to >> try and get inside the driver. This is working but it's getting more >> and more fragile the more tools we add to that box. A lot of what I >> want to do with gallium2 or whatever we're calling it is to fix our >> layering problems so that calls go in one direction and we can >> untangle the jumble. I'm still not sure what I want that to look like >> but I think I want it to look a lot like Vulkan, just with a handier >> interface. >> >> ~Faith >> >> > Not sure how useful this is in practice to you, but the lesson from my >> POV is that opt-in reusable and shared libraries are always time well spent >> as they can bend and adapt with the times, whereas no opt-out watertight >> abstractions inherently have a shelf life. >> > >> > Jose >> > >> > On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand >> wrote: >> >> >> >> Yeah, this one's gonna hit Phoronix... >> >> >> >> When we started writing Vulkan drivers back in the day, there was this >> >> notion that Vulkan was a low-level API that directly targets hardware. >> >> Vulkan drivers were these super thin things that just blasted packets >> >> straight into the hardware. What l
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On 24/01/2024 18:26, Faith Ekstrand wrote: > So far, we've been trying to build those components in terms of the > Vulkan API itself with calls jumping back into the dispatch table to > try and get inside the driver. To me, it looks like the "opt-in" approach would still be well-applicable to the goal of cleaning up "implementing Vulkan in Vulkan", and gradual changes diverging from the usual Vulkan specification behavior can be implemented and maintained in existing and new drivers more efficiently compared to a whole new programming model. I think it's important that the scale of our solution should be appropriate to the scale of the problem, otherwise we risk creating large issues in other areas. Currently there are pretty few places where Mesa implements Vulkan on top of Vulkan: • WSI, • Emulated render passes, • Emulated secondary command buffers, • Meta. For WSI, render passes and secondary command buffers, I don't think there's anything that needs to be done, as those already have little to none driver backend involvement or interference with application's calls — render pass and secondary command buffer emulation interacts with the hardware driver entirely within the framework of the Vulkan specification, only storing a few fields in vk_command_buffer which are already handled fully in common code. Common meta, on the other hand, yes, is extremely intrusive — overriding the application's pipeline state, bindings, and passing shaders directly in NIR bypassing SPIR-V. But with meta being such a different beast, I think we shouldn't even be trying to tame it with the same interfaces as everything else. If we're going to handle meta's special cases throughout our common "Gallium2" framework, it feels like we'll simply be turning our "Vulkan on Vulkan" issue into the problem of "implementing Gallium2 on Gallium2". Instead, I think the cleanest solution in the common meta would be sending commands to the driver through a separate callback interface specifically for meta instead of trying to make meta mimic application code. That would allow drivers to clearly negotiate the details of applying/reverting state changes, shader compilation, while letting their developers assume that everything else is written for the most part purely against the Vulkan specification. It would still be okay for meta to make calls to vkGetPhysicalDevice*, vkCreate*/vkDestroy*, as long as they're done within the rules of the Vulkan specification, to require certain extensions, as well as to do some less-intrusive, non-hot-path interaction with the driver's internals directly — such as requiring that every VkImage is a vk_image and pulling the needed create info fields from there. However, everything interacting with the state/bindings, as well as things going beyond the specification like creating image views with incompatible formats, would be going through those new callbacks. NVK-style drivers would be able to share a common implementation of those callbacks. Drivers that want to take advantage of more direct-to-hardware paths would need to provide what's friendly to them (maybe even with lighter handling of compute-based meta operations compared to graphics ones). That'd probably be not a single flat list of callbacks, but a bunch of ones — like it'd be possible for a driver to use the common command buffer callbacks, but to specialize some view/descriptor-related ones (it may not be possible to make those common at all, by the way). And if a driver doesn't need the common meta at all, none of that would be bothering it. The other advantages I see in this separate meta API approach are: • In the rest of the code, driver developers in most cases will need to refer to only a single authority — the massively detailed Vulkan specification, and there are risks regarding rolling our own interface for everything: • Driver developers will have to spend more time carefully looking up what they need to do in two places rather than largely just one. • We're much more prone to leaving gaps in our interface and to writing lacking documentation. I can't see this effort not being rushed, with us having to catch up to 10 years of XGL/Vulkan development, while moving many drivers alongside working on other tasks, and with varying levels of enthusiasm of driver developers towards this. Unless zmike's 10 years estimate is our actual target 🤷 • Having to deal with a new large-scale API may raise the barrier for new contributors and discourage them. Unlike with OpenGL with all the resource renaming stuff, except for shader compilation, the experience I got from developing applications on Vulkan was enough for me to start comfortably implementing it. When zmike showed me an R600g issue about some relation of vertex buffer bindings and CSOs, I just didn't have anything useful to say. • Faster iteration inside the common meta code, with the meta interfac
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
Hi, thanks, Faith, for bringing this discussion up. I think with Venus we are more interested in using utility libraries on an as-needed basis. Here, most of the time the Vulkan commands are just serialized according to the Venus protocol and this is then passed to the host because usually it wouldn't make sense to let the guest translate the Vulkan commands to something different (e.g. something that is commonly used in a runtime), only to then re-encode this in the Venus driver to satisfy the host Vulkan driver - just think Spir-V: why would we want to have NIR only to then re-encode it to Spir-V? I'd also like to give a +1 to the points raised by Triang3l and others about the potential of breaking other drivers. I've certainly be bitten by this on the Gallium side with r600, and unfortunately I can't set up a CI in my home office (and after watching the XDC talk about setting up your own CI I was even more discouraged to do this). In summary I certainly see the advantage in using common code, but with these two points above in mind I think opt-in is better. Gert
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Thu, Jan 25, 2024 at 5:06 PM Gert Wollny wrote: > Hi, > > thanks, Faith, for bringing this discussion up. > > I think with Venus we are more interested in using utility libraries on > an as-needed basis. Here, most of the time the Vulkan commands are just > serialized according to the Venus protocol and this is then passed to > the host because usually it wouldn't make sense to let the guest > translate the Vulkan commands to something different (e.g. something > that is commonly used in a runtime), only to then re-encode this in the > Venus driver to satisfy the host Vulkan driver - just think Spir-V: > why would we want to have NIR only to then re-encode it to Spir-V? > I think Venus is an entirely different class of driver. It's not even really a driver. It's more of a Vulkan layer that has a VM boundary in the middle. It's attempting to be as thin of a Vulkan -> Vulkan pass-through as possible. As such, it doesn't use most of the shared stuff anyway. It uses the dispatch framework and that's really about it. As long as that code stays in-tree roughly as-is, I think Venus will be fine. > I'd also like to give a +1 to the points raised by Triang3l and others > about the potential of breaking other drivers. I've certainly be bitten > by this on the Gallium side with r600, and unfortunately I can't set up > a CI in my home office (and after watching the XDC talk about setting > up your own CI I was even more discouraged to do this). > That's a risk with all common code. You could raise the same risk with NIR or basically anything else. Sure, if someone wants to go write all the code themselves in an attempt to avoid bugs, I guess they're free to do that. I don't really see that as a compelling argument, though. Also, while you experienced gallium breakage with r600, having worked on i965, I can guarantee you that that's still better than maintaining a classic (non-gallium) GL driver. 🙃 At the moment, given the responses I've seen and the scope of the project as things are starting to congeal in my head, I don't think this will be an incremental thing where drivers get converted as we go anymore. If we really do want to flip the flow, I think it'll be invasive enough that we'll build gallium2 and then people can port to it if they want. I may port a driver or two myself but those will be things I own or am at least willing to deal with the bug fallout for. Others can port or not at-will. This is what I meant when I said elsewhere that we're probably heading towards a gallium/classic situation again. I don't expect anyone to port until the benefits outweigh the costs but I do expect the benefits will be there eventually. ~Faith