[Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU

2020-01-08 Thread Luke Kenneth Casson Leighton
(if responding on mesa-dev please do cc me to help preserve the thread, i
am subscribed in digest mode, thanks)

the NLNet funding application documented here was successful:
https://libre-riscv.org/nlnet_2019_amdvlk_port/

we therefore have money (direct payment of tax free, tax deductible
donations from the NLNet Foundation into your bank account [1]) available
for anyone, anywhere in the world, to help create a 3D MESA Vulkan
compliant driver for a hybrid hard-soft GPU.

we considered starting from SwiftShader because it is, on initial
inspection, close to what we want. we could hypothetically have added deep
SIMD instructions, and the project would be 90% complete.

however when it comes to predication and to vector types such as vec2, vec3
and vec4, the infrastructure in SwiftShader is unable to cope. thus, if we
add custom hardware opcodes which can accelerate vector-vector operations,
SwiftShader would have been an extremely bad decision as it would be
incapable of using such hardware without a drastic redesign.

hence the reason for choosing MESA, because NIR can retain that critical
type information right up until it hits the hardware.

as you know, a hybrid CPU/GPU does not have a separate CPU and a separate
GPU, they are *one and the same*.  therefore if starting from the Intel
MESA driver or RADV, the initial work needed is to make the chosen base
actually a *software* renderer, just like SwiftShader.

this is a desirable goal and it will be important to have the code be
portable, unaccelerated, and run on at the minimum x86, and (later) the
Libre SoC.

once that phase is completed, *then* we may move to adding custom
accelerated opcodes (Transcendentals, YUV2RGB etc). bear in mind that these
will be added *to the CPU*... because the CPU *is* the GPU.

to be absolutely clear: there will be no marshalling of GPU data or
instructions, to be sent over to kernelspace IPC. the CPU will execute the
accelerated opcode (e.g atan2) directly and immediately.

this is significantly simpler than standard GPUs, saving on power
consumption and drastically simplifying debugging and application
developnent.

if this is of interest please do get in touch, and feel free to ask
questions.

best,

l.

[1] your accountant, with assistance from Bob Goudriaans, an International
Tax Law Specialist and Director of NLNet, can help confirm that the
payments from NLNet are considered charitable tax deductible donations...
*to you*.  all the International Tax Agreements are in place and the
documents available for inspection by your accountant.



-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NLNet Funding Application, EUR 50, 000, for help porting AMDVLK or MESA RADV to the Libre RISC-V Hybrid CPU/GPU.

2020-01-08 Thread Luke Kenneth Casson Leighton
hi all, the funding application was successful, i will follow up in a
separate subject line.

On Thursday, September 26, 2019, Luke Kenneth Casson Leighton 
wrote:

> (to preserve thread, please do cc me on any questions, i am subscribed
> digest, thanks)
>
> https://libre-riscv.org/nlnet_2019_amdvlk_port/
>
> i've submitted a funding request to the Charitable Foundation, NLNet,
> for people to be the recipient of donations to work on a port of MESA
> RADV (or AMDVLK, as appropriate) to the Libre RISC-V Hybrid CPU / GPU
> / VPU.
>



-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU

2020-01-08 Thread Jason Ekstrand
Drive-by comment: I don't think you actually want to base any decisions an 
a vec4 architecture. Nearly every company in the graphics industry thought 
that was a good idea and designed vec4 processors. Over the course of the 
last 15 years or so they have all, one by one, realized that it was a bad 
idea and stopped doing it. Instead, they all parallelize the other way and 
their SIMD instructions and on scalar values across 8, 16, 32, or 64 
invocations (vertices, pixels, etc.) Of the shader program.


That isn't too say that Mesa is there wrong choice or that there aren't 
other reasons why SwiftShader would be a bad fit. However, I would 
recommend against making major architectural decisions for the sole reason 
that it allows you to repeat one of the biggest architectural mistakes that 
the graphics industry as a whole managed to make and is only recently (last 
5 years) finally gotten over.


On January 8, 2020 18:35:16 Luke Kenneth Casson Leighton  wrote:
(if responding on mesa-dev please do cc me to help preserve the thread, i 
am subscribed in digest mode, thanks)


the NLNet funding application documented here was successful:
https://libre-riscv.org/nlnet_2019_amdvlk_port/

we therefore have money (direct payment of tax free, tax deductible 
donations from the NLNet Foundation into your bank account [1]) available 
for anyone, anywhere in the world, to help create a 3D MESA Vulkan 
compliant driver for a hybrid hard-soft GPU.


we considered starting from SwiftShader because it is, on initial 
inspection, close to what we want. we could hypothetically have added deep 
SIMD instructions, and the project would be 90% complete.


however when it comes to predication and to vector types such as vec2, vec3 
and vec4, the infrastructure in SwiftShader is unable to cope. thus, if we 
add custom hardware opcodes which can accelerate vector-vector operations, 
SwiftShader would have been an extremely bad decision as it would be 
incapable of using such hardware without a drastic redesign.


hence the reason for choosing MESA, because NIR can retain that critical 
type information right up until it hits the hardware.


as you know, a hybrid CPU/GPU does not have a separate CPU and a separate 
GPU, they are *one and the same*.  therefore if starting from the Intel 
MESA driver or RADV, the initial work needed is to make the chosen base 
actually a *software* renderer, just like SwiftShader.


this is a desirable goal and it will be important to have the code be 
portable, unaccelerated, and run on at the minimum x86, and (later) the 
Libre SoC.


once that phase is completed, *then* we may move to adding custom 
accelerated opcodes (Transcendentals, YUV2RGB etc). bear in mind that these 
will be added *to the CPU*... because the CPU *is* the GPU.


to be absolutely clear: there will be no marshalling of GPU data or 
instructions, to be sent over to kernelspace IPC. the CPU will execute the 
accelerated opcode (e.g atan2) directly and immediately.


this is significantly simpler than standard GPUs, saving on power 
consumption and drastically simplifying debugging and application developnent.


if this is of interest please do get in touch, and feel free to ask questions.

best,

l.

[1] your accountant, with assistance from Bob Goudriaans, an International 
Tax Law Specialist and Director of NLNet, can help confirm that the 
payments from NLNet are considered charitable tax deductible donations... 
*to you*.  all the International Tax Agreements are in place and the 
documents available for inspection by your accountant.




--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] haiku/hgl: Fix build via header reordering

2020-01-08 Thread Alexander von Gluck IV
Oh.. lol. I just realized mesa is doing PR's now in gitlab and i'm being 
old-fashion.


January 8, 2020 7:04 PM, "Alexander von Gluck IV"  wrote:
> ---
> src/gallium/state_trackers/hgl/hgl_context.h | 6 --
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/state_trackers/hgl/hgl_context.h
> b/src/gallium/state_trackers/hgl/hgl_context.h
> index c5995f1cd2b..e2ebfbad4bc 100644
> --- a/src/gallium/state_trackers/hgl/hgl_context.h
> +++ b/src/gallium/state_trackers/hgl/hgl_context.h
> @@ -9,11 +9,13 @@
> #define HGL_CONTEXT_H
> 
> -#include "state_tracker/st_api.h"
> -#include "state_tracker/st_manager.h"
> +#include "pipe/p_format.h"
> #include "pipe/p_compiler.h"
> #include "pipe/p_screen.h"
> #include "postprocess/filters.h"
> +
> +#include "state_tracker/st_api.h"
> +#include "state_tracker/st_manager.h"
> #include "os/os_thread.h"
> 
> #include "bitmap_wrapper.h"
> -- 
> 2.24.1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] haiku/hgl: Fix build via header reordering

2020-01-08 Thread Alexander von Gluck IV
---
 src/gallium/state_trackers/hgl/hgl_context.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/hgl/hgl_context.h 
b/src/gallium/state_trackers/hgl/hgl_context.h
index c5995f1cd2b..e2ebfbad4bc 100644
--- a/src/gallium/state_trackers/hgl/hgl_context.h
+++ b/src/gallium/state_trackers/hgl/hgl_context.h
@@ -9,11 +9,13 @@
 #define HGL_CONTEXT_H
 
 
-#include "state_tracker/st_api.h"
-#include "state_tracker/st_manager.h"
+#include "pipe/p_format.h"
 #include "pipe/p_compiler.h"
 #include "pipe/p_screen.h"
 #include "postprocess/filters.h"
+
+#include "state_tracker/st_api.h"
+#include "state_tracker/st_manager.h"
 #include "os/os_thread.h"
 
 #include "bitmap_wrapper.h"
-- 
2.24.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] util/u_thread: Fix build under Haiku

2020-01-08 Thread Alexander von Gluck IV
From: X512 

---
 src/util/u_thread.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/util/u_thread.h b/src/util/u_thread.h
index 6fc89099fec..5bb06608fc9 100644
--- a/src/util/u_thread.h
+++ b/src/util/u_thread.h
@@ -40,6 +40,10 @@
 #endif
 #endif
 
+#ifdef __HAIKU__
+#include 
+#endif
+
 #ifdef __FreeBSD__
 #define cpu_set_t cpuset_t
 #endif
@@ -77,6 +81,8 @@ static inline void u_thread_setname( const char *name )
pthread_setname_np(pthread_self(), "%s", (void *)name);
 #elif DETECT_OS_APPLE
pthread_setname_np(name);
+#elif DETECT_OS_HAIKU
+   rename_thread(find_thread(NULL), name);
 #else
 #error Not sure how to call pthread_setname_np
 #endif
@@ -149,7 +155,7 @@ util_get_L3_for_pinned_thread(thrd_t thread, unsigned 
cores_per_L3)
 static inline int64_t
 u_thread_get_time_nano(thrd_t thread)
 {
-#if defined(HAVE_PTHREAD) && !defined(__APPLE__)
+#if defined(HAVE_PTHREAD) && !defined(__APPLE__) && !defined(__HAIKU__)
struct timespec ts;
clockid_t cid;
 
-- 
2.24.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU

2020-01-08 Thread Luke Kenneth Casson Leighton
On Thursday, January 9, 2020, Jason Ekstrand  wrote:

> Drive-by comment:
>

really appreciate the feedback.


>  I don't think you actually want to base any decisions an a vec4
> architecture. Nearly every company in the graphics industry thought that
> was a good idea and designed vec4 processors. Over the course of the last
> 15 years or so they have all, one by one, realized that it was a bad idea
> and stopped doing it. Instead, they all parallelize the other way and their
> SIMD instructions and on scalar values across 8, 16, 32, or 64 invocations
> (vertices, pixels, etc.) Of the shader program.
>

for simplicity (not outlining nearly 18 months of Vector Architecture ISA
development) i missed out that we have designed a vector-plus-subvector
architecture, where the subvectors may be of any length between 1 and 4,
and there is an additional vector loop around that which may be from length
1 (scalar) to 64

individual predicate mask bits may be applied to vector however they may
only be applied (one bit) per subvector.

discussions have been ongoing for around 2 years now on the LLVM dev lists
to support this type of concept.

on the libre soc lists we have also had detailed discissions on how to do
swizzles at the subvector level.

AMDGPU, NVIDIA, MALI, they all support these capabilities.

to be clear: we have *not* designed an architecture or an ISA which
critically and exclusively depends on vec4 and vec4 alone.

now, whether it is a bad idea or not to have vec2, vec3 snd vec4
capability, the way that i see it is that Vulkan supports them, as does the
SPIRV compiler in e.g. AMDVLK, and we would be asking for trouble
(performance penalties, compiler complexity due to having to add predicated
autovectorisation) if we did not support them.

however, there are two nice things:

1. we are at an early phase, therefore we *can* evaluate valuable "headsup"
warnings such as the one you give, jason (so thank you)

2. as a flexible Vector Processor, soft-programmable, then over time if the
industry moves to dropping vec4, so can we.



> That isn't too say that Mesa is there wrong choice or that there aren't
> other reasons why SwiftShader would be a bad fit. However, I would
> recommend against making major architectural decipmentsions for the sole
> reason that it allows you to repeat one of the biggest architectural
> mistakes that the graphics industry as a whole managed to make and is only
> recently (last 5 years) finally gotten over.
>

consequently i am extremely grateful as the last thing we need is to spend
NLNet funds on repeating industry mistakes.

this is why i love libre hardware.  can you imagine the hell it would have
taken to get you signing an NDA just to be able to provide the insight that
you did?

:)

warmest,

l.



-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU

2020-01-08 Thread Jason Ekstrand
On Wed, Jan 8, 2020 at 7:55 PM Luke Kenneth Casson Leighton 
wrote:

>
>
> On Thursday, January 9, 2020, Jason Ekstrand  wrote:
>
>> Drive-by comment:
>>
>
> really appreciate the feedback.
>
>
>>  I don't think you actually want to base any decisions an a vec4
>> architecture. Nearly every company in the graphics industry thought that
>> was a good idea and designed vec4 processors. Over the course of the last
>> 15 years or so they have all, one by one, realized that it was a bad idea
>> and stopped doing it. Instead, they all parallelize the other way and their
>> SIMD instructions and on scalar values across 8, 16, 32, or 64 invocations
>> (vertices, pixels, etc.) Of the shader program.
>>
>
> for simplicity (not outlining nearly 18 months of Vector Architecture ISA
> development) i missed out that we have designed a vector-plus-subvector
> architecture, where the subvectors may be of any length between 1 and 4,
> and there is an additional vector loop around that which may be from length
> 1 (scalar) to 64
>
> individual predicate mask bits may be applied to vector however they may
> only be applied (one bit) per subvector.
>
> discussions have been ongoing for around 2 years now on the LLVM dev lists
> to support this type of concept.
>
> on the libre soc lists we have also had detailed discissions on how to do
> swizzles at the subvector level.
>
> AMDGPU, NVIDIA, MALI, they all support these capabilities.
>
> to be clear: we have *not* designed an architecture or an ISA which
> critically and exclusively depends on vec4 and vec4 alone.
>
> now, whether it is a bad idea or not to have vec2, vec3 snd vec4
> capability, the way that i see it is that Vulkan supports them, as does the
> SPIRV compiler in e.g. AMDVLK, and we would be asking for trouble
> (performance penalties, compiler complexity due to having to add predicated
> autovectorisation) if we did not support them.
>

Now that I'm at a keyboard and not a phone, I can provide a more thorough
explanation.  Hopefully that will be helpful.  First of all, let's start
with what SPIR-V supporting vec2/3/4 means.  It is true that a vec3 is a
native SPIR-V type; it is also a native type in GLSL and HLSL.  Those
languages also support matrices, write-masking of results, and swizzling on
sources.  This stuff is all very useful when writing graphics shaders
because 70% of what you do in graphics is some sort of vector math.  Having
constructs built directly into the language is really nice.  When it comes
to SPIR-V specifically, you should think of it much more like a high-level
language than like an IR; it's intentionally designed to lose as little
high-level information as possible.  Most of the early shader architectures
also had vec2/3/4 as native data types and had swizzling and write-masking
as core concepts; partly because it seemed like a good idea and partly
because that's the way Microsoft's D3D9 bytecode format worked.  The reason
why NIR supports vec4 is because there is a lot of hardware out there
(including Intel from 5-6 years ago) which needs vec4 and some of that
hardware is still very actively supported in Mesa.

On all modern architectures I'm familiar with (this includes Intel, NVIDIA,
AMD, ARM, Imagination, and Qualcomm), everything is scalarized so a vec3 +
vec3 add operation is turned into three scalar add operations.  In our
NIR-based compilers, the scalarization usually happens almost immediately
once you're in NIR.  We then run up to 64 invocations (vertex, pixel, etc.)
at a time with wide hardware instructions.  When control-flow diverges
(some invocations go one way and some another), both paths are executed and
predication is used to disable the SIMD lanes for the invocations that took
the other path.  On Intel and several other architectures, this happens
fairly automatically.  On AMD, their management of predicates is much more
manual.

What about vec4?  As I've said a couple of times, basically everyone in the
industry (at least Intel, NVIDIA, AMD, ARM, Imagination, and Qualcomm) has
done it at some point in the past.  Let's take Intel as a concrete
example.  It's a good example because a) I'm familiar with it, b) it's all
publicly documented and c) they did scalar and vec4 at the same time in the
same ISA so it's really easy to look at the trade-offs.  On Intel,
everything runs 8-wide (I'm simplifying a bit but it's good enough for this
discussion).  Older Intel hardware could run in one of two modes depending
on shader stage: SIMD8 or SIMD4x2.  In SIMD8 mode, each of the 8 lanes
corresponds to a different shader invocation and each instruction acts on 8
scalars, one from each invocation.  In SIMD4x2 mode, it runs 2 invocations
with 4 lanes per invocation.  The ISA has swizzles and write-masks so those
4 lanes can operate on an entire vec4 at a time.  There were even fancy
cross-lane opcodes for things like dot products.  For a lot of simple
operations, the SIMD4x2 mode was really slick.  If, for instance, you want
to mul