[Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU
(if responding on mesa-dev please do cc me to help preserve the thread, i am subscribed in digest mode, thanks) the NLNet funding application documented here was successful: https://libre-riscv.org/nlnet_2019_amdvlk_port/ we therefore have money (direct payment of tax free, tax deductible donations from the NLNet Foundation into your bank account [1]) available for anyone, anywhere in the world, to help create a 3D MESA Vulkan compliant driver for a hybrid hard-soft GPU. we considered starting from SwiftShader because it is, on initial inspection, close to what we want. we could hypothetically have added deep SIMD instructions, and the project would be 90% complete. however when it comes to predication and to vector types such as vec2, vec3 and vec4, the infrastructure in SwiftShader is unable to cope. thus, if we add custom hardware opcodes which can accelerate vector-vector operations, SwiftShader would have been an extremely bad decision as it would be incapable of using such hardware without a drastic redesign. hence the reason for choosing MESA, because NIR can retain that critical type information right up until it hits the hardware. as you know, a hybrid CPU/GPU does not have a separate CPU and a separate GPU, they are *one and the same*. therefore if starting from the Intel MESA driver or RADV, the initial work needed is to make the chosen base actually a *software* renderer, just like SwiftShader. this is a desirable goal and it will be important to have the code be portable, unaccelerated, and run on at the minimum x86, and (later) the Libre SoC. once that phase is completed, *then* we may move to adding custom accelerated opcodes (Transcendentals, YUV2RGB etc). bear in mind that these will be added *to the CPU*... because the CPU *is* the GPU. to be absolutely clear: there will be no marshalling of GPU data or instructions, to be sent over to kernelspace IPC. the CPU will execute the accelerated opcode (e.g atan2) directly and immediately. this is significantly simpler than standard GPUs, saving on power consumption and drastically simplifying debugging and application developnent. if this is of interest please do get in touch, and feel free to ask questions. best, l. [1] your accountant, with assistance from Bob Goudriaans, an International Tax Law Specialist and Director of NLNet, can help confirm that the payments from NLNet are considered charitable tax deductible donations... *to you*. all the International Tax Agreements are in place and the documents available for inspection by your accountant. -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] NLNet Funding Application, EUR 50, 000, for help porting AMDVLK or MESA RADV to the Libre RISC-V Hybrid CPU/GPU.
hi all, the funding application was successful, i will follow up in a separate subject line. On Thursday, September 26, 2019, Luke Kenneth Casson Leighton wrote: > (to preserve thread, please do cc me on any questions, i am subscribed > digest, thanks) > > https://libre-riscv.org/nlnet_2019_amdvlk_port/ > > i've submitted a funding request to the Charitable Foundation, NLNet, > for people to be the recipient of donations to work on a port of MESA > RADV (or AMDVLK, as appropriate) to the Libre RISC-V Hybrid CPU / GPU > / VPU. > -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU
Drive-by comment: I don't think you actually want to base any decisions an a vec4 architecture. Nearly every company in the graphics industry thought that was a good idea and designed vec4 processors. Over the course of the last 15 years or so they have all, one by one, realized that it was a bad idea and stopped doing it. Instead, they all parallelize the other way and their SIMD instructions and on scalar values across 8, 16, 32, or 64 invocations (vertices, pixels, etc.) Of the shader program. That isn't too say that Mesa is there wrong choice or that there aren't other reasons why SwiftShader would be a bad fit. However, I would recommend against making major architectural decisions for the sole reason that it allows you to repeat one of the biggest architectural mistakes that the graphics industry as a whole managed to make and is only recently (last 5 years) finally gotten over. On January 8, 2020 18:35:16 Luke Kenneth Casson Leighton wrote: (if responding on mesa-dev please do cc me to help preserve the thread, i am subscribed in digest mode, thanks) the NLNet funding application documented here was successful: https://libre-riscv.org/nlnet_2019_amdvlk_port/ we therefore have money (direct payment of tax free, tax deductible donations from the NLNet Foundation into your bank account [1]) available for anyone, anywhere in the world, to help create a 3D MESA Vulkan compliant driver for a hybrid hard-soft GPU. we considered starting from SwiftShader because it is, on initial inspection, close to what we want. we could hypothetically have added deep SIMD instructions, and the project would be 90% complete. however when it comes to predication and to vector types such as vec2, vec3 and vec4, the infrastructure in SwiftShader is unable to cope. thus, if we add custom hardware opcodes which can accelerate vector-vector operations, SwiftShader would have been an extremely bad decision as it would be incapable of using such hardware without a drastic redesign. hence the reason for choosing MESA, because NIR can retain that critical type information right up until it hits the hardware. as you know, a hybrid CPU/GPU does not have a separate CPU and a separate GPU, they are *one and the same*. therefore if starting from the Intel MESA driver or RADV, the initial work needed is to make the chosen base actually a *software* renderer, just like SwiftShader. this is a desirable goal and it will be important to have the code be portable, unaccelerated, and run on at the minimum x86, and (later) the Libre SoC. once that phase is completed, *then* we may move to adding custom accelerated opcodes (Transcendentals, YUV2RGB etc). bear in mind that these will be added *to the CPU*... because the CPU *is* the GPU. to be absolutely clear: there will be no marshalling of GPU data or instructions, to be sent over to kernelspace IPC. the CPU will execute the accelerated opcode (e.g atan2) directly and immediately. this is significantly simpler than standard GPUs, saving on power consumption and drastically simplifying debugging and application developnent. if this is of interest please do get in touch, and feel free to ask questions. best, l. [1] your accountant, with assistance from Bob Goudriaans, an International Tax Law Specialist and Director of NLNet, can help confirm that the payments from NLNet are considered charitable tax deductible donations... *to you*. all the International Tax Agreements are in place and the documents available for inspection by your accountant. -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] haiku/hgl: Fix build via header reordering
Oh.. lol. I just realized mesa is doing PR's now in gitlab and i'm being old-fashion. January 8, 2020 7:04 PM, "Alexander von Gluck IV" wrote: > --- > src/gallium/state_trackers/hgl/hgl_context.h | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/state_trackers/hgl/hgl_context.h > b/src/gallium/state_trackers/hgl/hgl_context.h > index c5995f1cd2b..e2ebfbad4bc 100644 > --- a/src/gallium/state_trackers/hgl/hgl_context.h > +++ b/src/gallium/state_trackers/hgl/hgl_context.h > @@ -9,11 +9,13 @@ > #define HGL_CONTEXT_H > > -#include "state_tracker/st_api.h" > -#include "state_tracker/st_manager.h" > +#include "pipe/p_format.h" > #include "pipe/p_compiler.h" > #include "pipe/p_screen.h" > #include "postprocess/filters.h" > + > +#include "state_tracker/st_api.h" > +#include "state_tracker/st_manager.h" > #include "os/os_thread.h" > > #include "bitmap_wrapper.h" > -- > 2.24.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] haiku/hgl: Fix build via header reordering
--- src/gallium/state_trackers/hgl/hgl_context.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/gallium/state_trackers/hgl/hgl_context.h b/src/gallium/state_trackers/hgl/hgl_context.h index c5995f1cd2b..e2ebfbad4bc 100644 --- a/src/gallium/state_trackers/hgl/hgl_context.h +++ b/src/gallium/state_trackers/hgl/hgl_context.h @@ -9,11 +9,13 @@ #define HGL_CONTEXT_H -#include "state_tracker/st_api.h" -#include "state_tracker/st_manager.h" +#include "pipe/p_format.h" #include "pipe/p_compiler.h" #include "pipe/p_screen.h" #include "postprocess/filters.h" + +#include "state_tracker/st_api.h" +#include "state_tracker/st_manager.h" #include "os/os_thread.h" #include "bitmap_wrapper.h" -- 2.24.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] util/u_thread: Fix build under Haiku
From: X512 --- src/util/u_thread.h | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/util/u_thread.h b/src/util/u_thread.h index 6fc89099fec..5bb06608fc9 100644 --- a/src/util/u_thread.h +++ b/src/util/u_thread.h @@ -40,6 +40,10 @@ #endif #endif +#ifdef __HAIKU__ +#include +#endif + #ifdef __FreeBSD__ #define cpu_set_t cpuset_t #endif @@ -77,6 +81,8 @@ static inline void u_thread_setname( const char *name ) pthread_setname_np(pthread_self(), "%s", (void *)name); #elif DETECT_OS_APPLE pthread_setname_np(name); +#elif DETECT_OS_HAIKU + rename_thread(find_thread(NULL), name); #else #error Not sure how to call pthread_setname_np #endif @@ -149,7 +155,7 @@ util_get_L3_for_pinned_thread(thrd_t thread, unsigned cores_per_L3) static inline int64_t u_thread_get_time_nano(thrd_t thread) { -#if defined(HAVE_PTHREAD) && !defined(__APPLE__) +#if defined(HAVE_PTHREAD) && !defined(__APPLE__) && !defined(__HAIKU__) struct timespec ts; clockid_t cid; -- 2.24.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU
On Thursday, January 9, 2020, Jason Ekstrand wrote: > Drive-by comment: > really appreciate the feedback. > I don't think you actually want to base any decisions an a vec4 > architecture. Nearly every company in the graphics industry thought that > was a good idea and designed vec4 processors. Over the course of the last > 15 years or so they have all, one by one, realized that it was a bad idea > and stopped doing it. Instead, they all parallelize the other way and their > SIMD instructions and on scalar values across 8, 16, 32, or 64 invocations > (vertices, pixels, etc.) Of the shader program. > for simplicity (not outlining nearly 18 months of Vector Architecture ISA development) i missed out that we have designed a vector-plus-subvector architecture, where the subvectors may be of any length between 1 and 4, and there is an additional vector loop around that which may be from length 1 (scalar) to 64 individual predicate mask bits may be applied to vector however they may only be applied (one bit) per subvector. discussions have been ongoing for around 2 years now on the LLVM dev lists to support this type of concept. on the libre soc lists we have also had detailed discissions on how to do swizzles at the subvector level. AMDGPU, NVIDIA, MALI, they all support these capabilities. to be clear: we have *not* designed an architecture or an ISA which critically and exclusively depends on vec4 and vec4 alone. now, whether it is a bad idea or not to have vec2, vec3 snd vec4 capability, the way that i see it is that Vulkan supports them, as does the SPIRV compiler in e.g. AMDVLK, and we would be asking for trouble (performance penalties, compiler complexity due to having to add predicated autovectorisation) if we did not support them. however, there are two nice things: 1. we are at an early phase, therefore we *can* evaluate valuable "headsup" warnings such as the one you give, jason (so thank you) 2. as a flexible Vector Processor, soft-programmable, then over time if the industry moves to dropping vec4, so can we. > That isn't too say that Mesa is there wrong choice or that there aren't > other reasons why SwiftShader would be a bad fit. However, I would > recommend against making major architectural decipmentsions for the sole > reason that it allows you to repeat one of the biggest architectural > mistakes that the graphics industry as a whole managed to make and is only > recently (last 5 years) finally gotten over. > consequently i am extremely grateful as the last thing we need is to spend NLNet funds on repeating industry mistakes. this is why i love libre hardware. can you imagine the hell it would have taken to get you signing an NDA just to be able to provide the insight that you did? :) warmest, l. -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU
On Wed, Jan 8, 2020 at 7:55 PM Luke Kenneth Casson Leighton wrote: > > > On Thursday, January 9, 2020, Jason Ekstrand wrote: > >> Drive-by comment: >> > > really appreciate the feedback. > > >> I don't think you actually want to base any decisions an a vec4 >> architecture. Nearly every company in the graphics industry thought that >> was a good idea and designed vec4 processors. Over the course of the last >> 15 years or so they have all, one by one, realized that it was a bad idea >> and stopped doing it. Instead, they all parallelize the other way and their >> SIMD instructions and on scalar values across 8, 16, 32, or 64 invocations >> (vertices, pixels, etc.) Of the shader program. >> > > for simplicity (not outlining nearly 18 months of Vector Architecture ISA > development) i missed out that we have designed a vector-plus-subvector > architecture, where the subvectors may be of any length between 1 and 4, > and there is an additional vector loop around that which may be from length > 1 (scalar) to 64 > > individual predicate mask bits may be applied to vector however they may > only be applied (one bit) per subvector. > > discussions have been ongoing for around 2 years now on the LLVM dev lists > to support this type of concept. > > on the libre soc lists we have also had detailed discissions on how to do > swizzles at the subvector level. > > AMDGPU, NVIDIA, MALI, they all support these capabilities. > > to be clear: we have *not* designed an architecture or an ISA which > critically and exclusively depends on vec4 and vec4 alone. > > now, whether it is a bad idea or not to have vec2, vec3 snd vec4 > capability, the way that i see it is that Vulkan supports them, as does the > SPIRV compiler in e.g. AMDVLK, and we would be asking for trouble > (performance penalties, compiler complexity due to having to add predicated > autovectorisation) if we did not support them. > Now that I'm at a keyboard and not a phone, I can provide a more thorough explanation. Hopefully that will be helpful. First of all, let's start with what SPIR-V supporting vec2/3/4 means. It is true that a vec3 is a native SPIR-V type; it is also a native type in GLSL and HLSL. Those languages also support matrices, write-masking of results, and swizzling on sources. This stuff is all very useful when writing graphics shaders because 70% of what you do in graphics is some sort of vector math. Having constructs built directly into the language is really nice. When it comes to SPIR-V specifically, you should think of it much more like a high-level language than like an IR; it's intentionally designed to lose as little high-level information as possible. Most of the early shader architectures also had vec2/3/4 as native data types and had swizzling and write-masking as core concepts; partly because it seemed like a good idea and partly because that's the way Microsoft's D3D9 bytecode format worked. The reason why NIR supports vec4 is because there is a lot of hardware out there (including Intel from 5-6 years ago) which needs vec4 and some of that hardware is still very actively supported in Mesa. On all modern architectures I'm familiar with (this includes Intel, NVIDIA, AMD, ARM, Imagination, and Qualcomm), everything is scalarized so a vec3 + vec3 add operation is turned into three scalar add operations. In our NIR-based compilers, the scalarization usually happens almost immediately once you're in NIR. We then run up to 64 invocations (vertex, pixel, etc.) at a time with wide hardware instructions. When control-flow diverges (some invocations go one way and some another), both paths are executed and predication is used to disable the SIMD lanes for the invocations that took the other path. On Intel and several other architectures, this happens fairly automatically. On AMD, their management of predicates is much more manual. What about vec4? As I've said a couple of times, basically everyone in the industry (at least Intel, NVIDIA, AMD, ARM, Imagination, and Qualcomm) has done it at some point in the past. Let's take Intel as a concrete example. It's a good example because a) I'm familiar with it, b) it's all publicly documented and c) they did scalar and vec4 at the same time in the same ISA so it's really easy to look at the trade-offs. On Intel, everything runs 8-wide (I'm simplifying a bit but it's good enough for this discussion). Older Intel hardware could run in one of two modes depending on shader stage: SIMD8 or SIMD4x2. In SIMD8 mode, each of the 8 lanes corresponds to a different shader invocation and each instruction acts on 8 scalars, one from each invocation. In SIMD4x2 mode, it runs 2 invocations with 4 lanes per invocation. The ISA has swizzles and write-masks so those 4 lanes can operate on an entire vec4 at a time. There were even fancy cross-lane opcodes for things like dot products. For a lot of simple operations, the SIMD4x2 mode was really slick. If, for instance, you want to mul