[Mesa-dev] mediump support: future work
Hi, This is the status of mediump support in Mesa. What I listed is what AMD GPUs can do. "Yes" means what Mesa supports. *Feature* *FP16 support* *Int16 support* ALU Yes No Uniforms No No VS in No No VS out / FS in No No FS out No No TCS, TES, GS out / in No No Sampler coordinates (only coord, derivs, lod, bias; not offset and compare) No --- Image coordinates --- No Return value from samplers (incl. sampler buffers) Yes No Return value from image loads (incl. image buffers) No No Data source for image stores (incl. image buffers) No No If 16-bit sampler/image instructions are surrounded by conversions, promote them to 32 bits No No Please let me know if you don't see the table correctly. I'd like to know if I can enable some of them using the existing FP16 CAP. The only drivers supporting FP16 are currently Freedreno and Panfrost. Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] mediump support: future work
On Mon, May 4, 2020 at 11:44 AM Marek Olšák wrote: > > Hi, > > This is the status of mediump support in Mesa. What I listed is what AMD GPUs > can do. "Yes" means what Mesa supports. > > Feature FP16 support Int16 support > ALU Yes No > Uniforms No No > VS in No No > VS out / FS in No No > FS out No No > TCS, TES, GS out / in No No > Sampler coordinates (only coord, derivs, lod, bias; not offset and compare) > No --- > Image coordinates --- No > Return value from samplers (incl. sampler buffers) Yes > No > Return value from image loads (incl. image buffers) No No > Data source for image stores (incl. image buffers) No No > If 16-bit sampler/image instructions are surrounded by conversions, promote > them to 32 bits No No > > Please let me know if you don't see the table correctly. > > I'd like to know if I can enable some of them using the existing FP16 CAP. > The only drivers supporting FP16 are currently Freedreno and Panfrost. > I think in general it should be ok. I think for ir3 we want 32b inputs/outputs for geom stages (vs/hs/ds/gs). For frag outs we use nir_lower_mediump_outputs.. maybe this is a good approach to continue, to use a simple nir lowering pass for cases where a shader stage can directly take 16b input/output. For frag inputs we fold the narrowing conversion in to the varying fetch instruction in backend. int16 would be pretty useful, for loop counters especially.. these can have a long live-range and currently wastefully occupy a full 32b reg. Uniforms we haven't cared too much about, since we can (usually) read a 32b uniform as a 16b and fold that directly into alu instructions.. we handle that in the backend. Pushing mediump support further would be great, and we can definitely help if it ends up needing changes in freedreno backend. The deqp coverage in CI should give us pretty good confidence about whether or not we are breaking things in the ir3 backend. BR, -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] mediump support: future work
16-bit varyings only make sense if they are packed, i.e. we need to fit 2 16-bit 4D varyings into 1 vec4 slot to save memory for IO. Without that, AMD (and most others?) won't benefit from 16-bit IO much. 16-bit uniforms would help everybody, because there is potential for uniform packing, saving memory (and cache lines). The other items are just for eliminating conversion instructions. We must have more vectorized 16-bit vec2 instructions than "conversion instructions + vec2 packing instructions" for mediump to pay off. We also don't get decreased register usage if we are not vectorized, so mediump is a tough sell at the moment. Marek On Mon, May 4, 2020 at 7:03 PM Rob Clark wrote: > On Mon, May 4, 2020 at 11:44 AM Marek Olšák wrote: > > > > Hi, > > > > This is the status of mediump support in Mesa. What I listed is what AMD > GPUs can do. "Yes" means what Mesa supports. > > > > Feature FP16 support Int16 support > > ALU Yes No > > Uniforms No No > > VS in No No > > VS out / FS in No No > > FS out No No > > TCS, TES, GS out / in No No > > Sampler coordinates (only coord, derivs, lod, bias; not offset and > compare) No --- > > Image coordinates --- No > > Return value from samplers (incl. sampler buffers) Yes > > No > > Return value from image loads (incl. image buffers) No No > > Data source for image stores (incl. image buffers) No No > > If 16-bit sampler/image instructions are surrounded by conversions, > promote them to 32 bits No No > > > > Please let me know if you don't see the table correctly. > > > > I'd like to know if I can enable some of them using the existing FP16 > CAP. The only drivers supporting FP16 are currently Freedreno and Panfrost. > > > > I think in general it should be ok. > > I think for ir3 we want 32b inputs/outputs for geom stages > (vs/hs/ds/gs). For frag outs we use nir_lower_mediump_outputs.. maybe > this is a good approach to continue, to use a simple nir lowering pass > for cases where a shader stage can directly take 16b input/output. > For frag inputs we fold the narrowing conversion in to the varying > fetch instruction in backend. > > int16 would be pretty useful, for loop counters especially.. these can > have a long live-range and currently wastefully occupy a full 32b reg. > > Uniforms we haven't cared too much about, since we can (usually) read > a 32b uniform as a 16b and fold that directly into alu instructions.. > we handle that in the backend. > > Pushing mediump support further would be great, and we can definitely > help if it ends up needing changes in freedreno backend. The deqp > coverage in CI should give us pretty good confidence about whether or > not we are breaking things in the ir3 backend. > > BR, > -R > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] mediump support: future work
On Intel, mesa currently supports fp16 as well as int16 and int8 for Vulkan. We support them for ALU ops as well as UBOs, SSBOs, and shared memory. We have hardware for some texture instructions with fp16 sources or destinations but it's all over the map in terms of what ops support what. In theory, it should be more efficient to use it when we can (people claim internal bandwidth savings) and I think some people have branches somewhere where they've experimented. However, we've never shipped a driver that has it wired up. For I/O, none of our stages support it natively. There are patches or a MR somewhere that has it hooked up to go between geometry stages via packing and it mostly works. The problematic case is TCS outputs because they're read/write and you have to be able to write two different components of a vector from different invocations which requires either HW support or expensive atomic cmpxchg loops. Also, we can't support interpolation on fp16 FS inputs so we'd need to be careful about that. I think it'd be fine if someone wanted to write a packing pass to handle the common cases as long as we can turn it off for the nasty corners and fall back to fp32. --Jason On Mon, May 4, 2020 at 7:09 PM Marek Olšák wrote: > > 16-bit varyings only make sense if they are packed, i.e. we need to fit 2 > 16-bit 4D varyings into 1 vec4 slot to save memory for IO. Without that, AMD > (and most others?) won't benefit from 16-bit IO much. > > 16-bit uniforms would help everybody, because there is potential for uniform > packing, saving memory (and cache lines). > > The other items are just for eliminating conversion instructions. We must > have more vectorized 16-bit vec2 instructions than "conversion instructions + > vec2 packing instructions" for mediump to pay off. We also don't get > decreased register usage if we are not vectorized, so mediump is a tough sell > at the moment. > > Marek > > On Mon, May 4, 2020 at 7:03 PM Rob Clark wrote: >> >> On Mon, May 4, 2020 at 11:44 AM Marek Olšák wrote: >> > >> > Hi, >> > >> > This is the status of mediump support in Mesa. What I listed is what AMD >> > GPUs can do. "Yes" means what Mesa supports. >> > >> > Feature FP16 support Int16 support >> > ALU Yes No >> > Uniforms No No >> > VS in No No >> > VS out / FS in No No >> > FS out No No >> > TCS, TES, GS out / in No No >> > Sampler coordinates (only coord, derivs, lod, bias; not offset and >> > compare) No --- >> > Image coordinates --- No >> > Return value from samplers (incl. sampler buffers) Yes >> > No >> > Return value from image loads (incl. image buffers) No No >> > Data source for image stores (incl. image buffers) No No >> > If 16-bit sampler/image instructions are surrounded by conversions, >> > promote them to 32 bits No No >> > >> > Please let me know if you don't see the table correctly. >> > >> > I'd like to know if I can enable some of them using the existing FP16 CAP. >> > The only drivers supporting FP16 are currently Freedreno and Panfrost. >> > >> >> I think in general it should be ok. >> >> I think for ir3 we want 32b inputs/outputs for geom stages >> (vs/hs/ds/gs). For frag outs we use nir_lower_mediump_outputs.. maybe >> this is a good approach to continue, to use a simple nir lowering pass >> for cases where a shader stage can directly take 16b input/output. >> For frag inputs we fold the narrowing conversion in to the varying >> fetch instruction in backend. >> >> int16 would be pretty useful, for loop counters especially.. these can >> have a long live-range and currently wastefully occupy a full 32b reg. >> >> Uniforms we haven't cared too much about, since we can (usually) read >> a 32b uniform as a 16b and fold that directly into alu instructions.. >> we handle that in the backend. >> >> Pushing mediump support further would be great, and we can definitely >> help if it ends up needing changes in freedreno backend. The deqp >> coverage in CI should give us pretty good confidence about whether or >> not we are breaking things in the ir3 backend. >> >> BR, >> -R > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] mediump support: future work
On Mon, May 4, 2020 at 5:09 PM Marek Olšák wrote: > > 16-bit varyings only make sense if they are packed, i.e. we need to fit 2 > 16-bit 4D varyings into 1 vec4 slot to save memory for IO. Without that, AMD > (and most others?) won't benefit from 16-bit IO much. > I guess for !flat varyings that mostly makes sense if you are manually interpolating in the fs? We can, but don't have to and it doesn't seem like a benefit to do so. Maybe it would be a win for flat varyings, but unclear, we might win more from switching to the instruction we use for interpolated varyings instead of the one that bypasses interpolation. At least that seems to be what blob does on new gens. > 16-bit uniforms would help everybody, because there is potential for uniform > packing, saving memory (and cache lines). > it does mean futzing w/ uniforms before uploading.. I'm not sure (for us) that is a win vs just using the hw builtin automagic fp32->fp16 push-constant conversion.. the push constant upload is pipelined with draws afaict for newer gens, and from shader standpoint, other than the restrictions about which instructions can use const src and when, they are basically free to load.. ie. loading cN.m as hcN.m is free. so might also what to be a driver option? > The other items are just for eliminating conversion instructions. We must > have more vectorized 16-bit vec2 instructions than "conversion instructions + > vec2 packing instructions" for mediump to pay off. We also don't get > decreased register usage if we are not vectorized, so mediump is a tough sell > at the moment. we don't really have "vectorized fp16".. we have a sort of "vectorish" mode where a scalar instruction can repeat, incrementing dst register and optionally incrementing individual src registers (ie. we can do .yyy or .yzw swizzles but not others). That is orthogonal to fp16 (but there may be lower latency for fp16) and mostly seems to help reducing the latency to load src registers (since hw can load a non-incremented src register once for each of the scalar instructions packed together). Scalar 16b instructions might be a win, but it is a bit more complicated to tease out the instruction cycles vs the register load cost. balancing register pressure vs "vectorish" instructions is a thing I'm still working on. But ignoring that fp16 is a win for us because of register pressure.. ie. a full-reg conflicts with two half-regs. For sure, a lot of the gain involves avoiding excessive conversions, but in a lot of common cases we can fold conversion into alu instruction in the backend.. BR, -R > > Marek > > On Mon, May 4, 2020 at 7:03 PM Rob Clark wrote: >> >> On Mon, May 4, 2020 at 11:44 AM Marek Olšák wrote: >> > >> > Hi, >> > >> > This is the status of mediump support in Mesa. What I listed is what AMD >> > GPUs can do. "Yes" means what Mesa supports. >> > >> > Feature FP16 support Int16 support >> > ALU Yes No >> > Uniforms No No >> > VS in No No >> > VS out / FS in No No >> > FS out No No >> > TCS, TES, GS out / in No No >> > Sampler coordinates (only coord, derivs, lod, bias; not offset and >> > compare) No --- >> > Image coordinates --- No >> > Return value from samplers (incl. sampler buffers) Yes >> > No >> > Return value from image loads (incl. image buffers) No No >> > Data source for image stores (incl. image buffers) No No >> > If 16-bit sampler/image instructions are surrounded by conversions, >> > promote them to 32 bits No No >> > >> > Please let me know if you don't see the table correctly. >> > >> > I'd like to know if I can enable some of them using the existing FP16 CAP. >> > The only drivers supporting FP16 are currently Freedreno and Panfrost. >> > >> >> I think in general it should be ok. >> >> I think for ir3 we want 32b inputs/outputs for geom stages >> (vs/hs/ds/gs). For frag outs we use nir_lower_mediump_outputs.. maybe >> this is a good approach to continue, to use a simple nir lowering pass >> for cases where a shader stage can directly take 16b input/output. >> For frag inputs we fold the narrowing conversion in to the varying >> fetch instruction in backend. >> >> int16 would be pretty useful, for loop counters especially.. these can >> have a long live-range and currently wastefully occupy a full 32b reg. >> >> Uniforms we haven't cared too much about, since we can (usually) read >> a 32b uniform as a 16b and fold that directly into alu instructions.. >> we handle that in the backend. >> >> Pushing mediump support further would be great, and we can definitely >> help if it ends up needing changes in freedreno backend. The deqp >> coverage in CI should give us pretty good confidence about whether or >> not we are breaking things in the ir3 backend. >> >> BR, >> -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev