[Mesa-dev] mediump support: future work

2020-05-04 Thread Marek Olšák
Hi,

This is the status of mediump support in Mesa. What I listed is what AMD
GPUs can do. "Yes" means what Mesa supports.

*Feature* *FP16 support* *Int16 support*
ALU Yes No
Uniforms No No
VS in No No
VS out / FS in No No
FS out No No
TCS, TES, GS out / in No No
Sampler coordinates (only coord, derivs, lod, bias; not offset and compare)
No ---
Image coordinates --- No
Return value from samplers (incl. sampler buffers) Yes
No
Return value from image loads (incl. image buffers) No No
Data source for image stores (incl. image buffers) No No
If 16-bit sampler/image instructions are surrounded by conversions, promote
them to 32 bits No No

Please let me know if you don't see the table correctly.

I'd like to know if I can enable some of them using the existing FP16 CAP.
The only drivers supporting FP16 are currently Freedreno and Panfrost.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] mediump support: future work

2020-05-04 Thread Rob Clark
On Mon, May 4, 2020 at 11:44 AM Marek Olšák  wrote:
>
> Hi,
>
> This is the status of mediump support in Mesa. What I listed is what AMD GPUs 
> can do. "Yes" means what Mesa supports.
>
> Feature FP16 support Int16 support
> ALU Yes No
> Uniforms No No
> VS in No No
> VS out / FS in No No
> FS out No No
> TCS, TES, GS out / in No No
> Sampler coordinates (only coord, derivs, lod, bias; not offset and compare) 
> No ---
> Image coordinates --- No
> Return value from samplers (incl. sampler buffers) Yes
> No
> Return value from image loads (incl. image buffers) No No
> Data source for image stores (incl. image buffers) No No
> If 16-bit sampler/image instructions are surrounded by conversions, promote 
> them to 32 bits No No
>
> Please let me know if you don't see the table correctly.
>
> I'd like to know if I can enable some of them using the existing FP16 CAP. 
> The only drivers supporting FP16 are currently Freedreno and Panfrost.
>

I think in general it should be ok.

I think for ir3 we want 32b inputs/outputs for geom stages
(vs/hs/ds/gs).  For frag outs we use nir_lower_mediump_outputs.. maybe
this is a good approach to continue, to use a simple nir lowering pass
for cases where a shader stage can directly take 16b input/output.
For frag inputs we fold the narrowing conversion in to the varying
fetch instruction in backend.

int16 would be pretty useful, for loop counters especially.. these can
have a long live-range and currently wastefully occupy a full 32b reg.

Uniforms we haven't cared too much about, since we can (usually) read
a 32b uniform as a 16b and fold that directly into alu instructions..
we handle that in the backend.

Pushing mediump support further would be great, and we can definitely
help if it ends up needing changes in freedreno backend.  The deqp
coverage in CI should give us pretty good confidence about whether or
not we are breaking things in the ir3 backend.

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] mediump support: future work

2020-05-04 Thread Marek Olšák
16-bit varyings only make sense if they are packed, i.e. we need to fit 2
16-bit 4D varyings into 1 vec4 slot to save memory for IO. Without that,
AMD (and most others?) won't benefit from 16-bit IO much.

16-bit uniforms would help everybody, because there is potential for
uniform packing, saving memory (and cache lines).

The other items are just for eliminating conversion instructions. We must
have more vectorized 16-bit vec2 instructions than "conversion
instructions + vec2 packing instructions" for mediump to pay off. We also
don't get decreased register usage if we are not vectorized, so mediump is
a tough sell at the moment.

Marek

On Mon, May 4, 2020 at 7:03 PM Rob Clark  wrote:

> On Mon, May 4, 2020 at 11:44 AM Marek Olšák  wrote:
> >
> > Hi,
> >
> > This is the status of mediump support in Mesa. What I listed is what AMD
> GPUs can do. "Yes" means what Mesa supports.
> >
> > Feature FP16 support Int16 support
> > ALU Yes No
> > Uniforms No No
> > VS in No No
> > VS out / FS in No No
> > FS out No No
> > TCS, TES, GS out / in No No
> > Sampler coordinates (only coord, derivs, lod, bias; not offset and
> compare) No ---
> > Image coordinates --- No
> > Return value from samplers (incl. sampler buffers) Yes
> > No
> > Return value from image loads (incl. image buffers) No No
> > Data source for image stores (incl. image buffers) No No
> > If 16-bit sampler/image instructions are surrounded by conversions,
> promote them to 32 bits No No
> >
> > Please let me know if you don't see the table correctly.
> >
> > I'd like to know if I can enable some of them using the existing FP16
> CAP. The only drivers supporting FP16 are currently Freedreno and Panfrost.
> >
>
> I think in general it should be ok.
>
> I think for ir3 we want 32b inputs/outputs for geom stages
> (vs/hs/ds/gs).  For frag outs we use nir_lower_mediump_outputs.. maybe
> this is a good approach to continue, to use a simple nir lowering pass
> for cases where a shader stage can directly take 16b input/output.
> For frag inputs we fold the narrowing conversion in to the varying
> fetch instruction in backend.
>
> int16 would be pretty useful, for loop counters especially.. these can
> have a long live-range and currently wastefully occupy a full 32b reg.
>
> Uniforms we haven't cared too much about, since we can (usually) read
> a 32b uniform as a 16b and fold that directly into alu instructions..
> we handle that in the backend.
>
> Pushing mediump support further would be great, and we can definitely
> help if it ends up needing changes in freedreno backend.  The deqp
> coverage in CI should give us pretty good confidence about whether or
> not we are breaking things in the ir3 backend.
>
> BR,
> -R
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] mediump support: future work

2020-05-04 Thread Jason Ekstrand
On Intel, mesa currently supports fp16 as well as int16 and int8 for
Vulkan.  We support them for ALU ops as well as UBOs, SSBOs, and
shared memory.

We have hardware for some texture instructions with fp16 sources or
destinations but it's all over the map in terms of what ops support
what.  In theory, it should be more efficient to use it when we can
(people claim internal bandwidth savings) and I think some people have
branches somewhere where they've experimented.  However, we've never
shipped a driver that has it wired up.

For I/O, none of our stages support it natively. There are patches or
a MR somewhere that has it hooked up to go between geometry stages via
packing and it mostly works.  The problematic case is TCS outputs
because they're read/write and you have to be able to write two
different components of a vector from different invocations which
requires either HW support or expensive atomic cmpxchg loops.  Also,
we can't support interpolation on fp16 FS inputs so we'd need to be
careful about that.  I think it'd be fine if someone wanted to write a
packing pass to handle the common cases as long as we can turn it off
for the nasty corners and fall back to fp32.

--Jason

On Mon, May 4, 2020 at 7:09 PM Marek Olšák  wrote:
>
> 16-bit varyings only make sense if they are packed, i.e. we need to fit 2 
> 16-bit 4D varyings into 1 vec4 slot to save memory for IO. Without that, AMD 
> (and most others?) won't benefit from 16-bit IO much.
>
> 16-bit uniforms would help everybody, because there is potential for uniform 
> packing, saving memory (and cache lines).
>
> The other items are just for eliminating conversion instructions. We must 
> have more vectorized 16-bit vec2 instructions than "conversion instructions + 
> vec2 packing instructions" for mediump to pay off. We also don't get 
> decreased register usage if we are not vectorized, so mediump is a tough sell 
> at the moment.
>
> Marek
>
> On Mon, May 4, 2020 at 7:03 PM Rob Clark  wrote:
>>
>> On Mon, May 4, 2020 at 11:44 AM Marek Olšák  wrote:
>> >
>> > Hi,
>> >
>> > This is the status of mediump support in Mesa. What I listed is what AMD 
>> > GPUs can do. "Yes" means what Mesa supports.
>> >
>> > Feature FP16 support Int16 support
>> > ALU Yes No
>> > Uniforms No No
>> > VS in No No
>> > VS out / FS in No No
>> > FS out No No
>> > TCS, TES, GS out / in No No
>> > Sampler coordinates (only coord, derivs, lod, bias; not offset and 
>> > compare) No ---
>> > Image coordinates --- No
>> > Return value from samplers (incl. sampler buffers) Yes
>> > No
>> > Return value from image loads (incl. image buffers) No No
>> > Data source for image stores (incl. image buffers) No No
>> > If 16-bit sampler/image instructions are surrounded by conversions, 
>> > promote them to 32 bits No No
>> >
>> > Please let me know if you don't see the table correctly.
>> >
>> > I'd like to know if I can enable some of them using the existing FP16 CAP. 
>> > The only drivers supporting FP16 are currently Freedreno and Panfrost.
>> >
>>
>> I think in general it should be ok.
>>
>> I think for ir3 we want 32b inputs/outputs for geom stages
>> (vs/hs/ds/gs).  For frag outs we use nir_lower_mediump_outputs.. maybe
>> this is a good approach to continue, to use a simple nir lowering pass
>> for cases where a shader stage can directly take 16b input/output.
>> For frag inputs we fold the narrowing conversion in to the varying
>> fetch instruction in backend.
>>
>> int16 would be pretty useful, for loop counters especially.. these can
>> have a long live-range and currently wastefully occupy a full 32b reg.
>>
>> Uniforms we haven't cared too much about, since we can (usually) read
>> a 32b uniform as a 16b and fold that directly into alu instructions..
>> we handle that in the backend.
>>
>> Pushing mediump support further would be great, and we can definitely
>> help if it ends up needing changes in freedreno backend.  The deqp
>> coverage in CI should give us pretty good confidence about whether or
>> not we are breaking things in the ir3 backend.
>>
>> BR,
>> -R
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] mediump support: future work

2020-05-04 Thread Rob Clark
On Mon, May 4, 2020 at 5:09 PM Marek Olšák  wrote:
>
> 16-bit varyings only make sense if they are packed, i.e. we need to fit 2 
> 16-bit 4D varyings into 1 vec4 slot to save memory for IO. Without that, AMD 
> (and most others?) won't benefit from 16-bit IO much.
>

I guess for !flat varyings that mostly makes sense if you are manually
interpolating in the fs?  We can, but don't have to and it doesn't
seem like a benefit to do so.  Maybe it would be a win for flat
varyings, but unclear, we might win more from switching to the
instruction we use for interpolated varyings instead of the one that
bypasses interpolation.  At least that seems to be what blob does on
new gens.

> 16-bit uniforms would help everybody, because there is potential for uniform 
> packing, saving memory (and cache lines).
>

it does mean futzing w/ uniforms before uploading.. I'm not sure (for
us) that is a win vs just using the hw builtin automagic fp32->fp16
push-constant conversion.. the push constant upload is pipelined with
draws afaict for newer gens, and from shader standpoint, other than
the restrictions about which instructions can use const src and when,
they are basically free to load.. ie. loading cN.m as hcN.m is free.

so might also what to be a driver option?

> The other items are just for eliminating conversion instructions. We must 
> have more vectorized 16-bit vec2 instructions than "conversion instructions + 
> vec2 packing instructions" for mediump to pay off. We also don't get 
> decreased register usage if we are not vectorized, so mediump is a tough sell 
> at the moment.

we don't really have "vectorized fp16".. we have a sort of "vectorish"
mode where a scalar instruction can repeat, incrementing dst register
and optionally incrementing individual src registers (ie. we can do
.yyy or .yzw swizzles but not others).  That is orthogonal to fp16
(but there may be lower latency for fp16) and mostly seems to help
reducing the latency to load src registers (since hw can load a
non-incremented src register once for each of the scalar instructions
packed together).  Scalar 16b instructions might be a win, but it is a
bit more complicated to tease out the instruction cycles vs the
register load cost.

balancing register pressure vs "vectorish" instructions is a thing I'm
still working on.  But ignoring that fp16 is a win for us because of
register pressure.. ie. a full-reg conflicts with two half-regs.

For sure, a lot of the gain involves avoiding excessive conversions,
but in a lot of common cases we can fold conversion into alu
instruction in the backend..

BR,
-R

>
> Marek
>
> On Mon, May 4, 2020 at 7:03 PM Rob Clark  wrote:
>>
>> On Mon, May 4, 2020 at 11:44 AM Marek Olšák  wrote:
>> >
>> > Hi,
>> >
>> > This is the status of mediump support in Mesa. What I listed is what AMD 
>> > GPUs can do. "Yes" means what Mesa supports.
>> >
>> > Feature FP16 support Int16 support
>> > ALU Yes No
>> > Uniforms No No
>> > VS in No No
>> > VS out / FS in No No
>> > FS out No No
>> > TCS, TES, GS out / in No No
>> > Sampler coordinates (only coord, derivs, lod, bias; not offset and 
>> > compare) No ---
>> > Image coordinates --- No
>> > Return value from samplers (incl. sampler buffers) Yes
>> > No
>> > Return value from image loads (incl. image buffers) No No
>> > Data source for image stores (incl. image buffers) No No
>> > If 16-bit sampler/image instructions are surrounded by conversions, 
>> > promote them to 32 bits No No
>> >
>> > Please let me know if you don't see the table correctly.
>> >
>> > I'd like to know if I can enable some of them using the existing FP16 CAP. 
>> > The only drivers supporting FP16 are currently Freedreno and Panfrost.
>> >
>>
>> I think in general it should be ok.
>>
>> I think for ir3 we want 32b inputs/outputs for geom stages
>> (vs/hs/ds/gs).  For frag outs we use nir_lower_mediump_outputs.. maybe
>> this is a good approach to continue, to use a simple nir lowering pass
>> for cases where a shader stage can directly take 16b input/output.
>> For frag inputs we fold the narrowing conversion in to the varying
>> fetch instruction in backend.
>>
>> int16 would be pretty useful, for loop counters especially.. these can
>> have a long live-range and currently wastefully occupy a full 32b reg.
>>
>> Uniforms we haven't cared too much about, since we can (usually) read
>> a 32b uniform as a 16b and fold that directly into alu instructions..
>> we handle that in the backend.
>>
>> Pushing mediump support further would be great, and we can definitely
>> help if it ends up needing changes in freedreno backend.  The deqp
>> coverage in CI should give us pretty good confidence about whether or
>> not we are breaking things in the ir3 backend.
>>
>> BR,
>> -R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev