On Wed, Mar 27, 2013 at 1:43 PM, Alex Deucher <alexdeuc...@gmail.com> wrote: > On Wed, Mar 27, 2013 at 7:25 AM, Michel Dänzer <mic...@daenzer.net> wrote: >> On Mit, 2013-03-27 at 12:02 +0100, Christian König wrote: >>> Am 26.03.2013 18:03, schrieb Michel Dänzer: >>> > On Die, 2013-03-26 at 17:37 +0100, Christian König wrote: >>> >> Am 26.03.2013 15:56, schrieb Michel Dänzer: >>> >>> On Die, 2013-03-26 at 14:51 +0100, Christian König wrote: >>> >>>> From: Christian König <christian.koe...@amd.com> >>> >>>> >>> >>>> Signed-off-by: Christian König <christian.koe...@amd.com> >>> >>>> [...] >>> >>>> diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h >>> >>>> b/src/gallium/drivers/radeonsi/radeonsi_shader.h >>> >>>> index 9dae742..e09f297 100644 >>> >>>> --- a/src/gallium/drivers/radeonsi/radeonsi_shader.h >>> >>>> +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h >>> >>>> @@ -111,13 +111,18 @@ struct si_shader { >>> >>>> unsigned nr_cbufs; >>> >>>> }; >>> >>>> >>> >>>> -struct si_shader_key { >>> >>>> - unsigned export_16bpc:8; >>> >>>> - unsigned nr_cbufs:4; >>> >>>> - unsigned color_two_side:1; >>> >>>> - unsigned alpha_func:3; >>> >>>> - unsigned flatshade:1; >>> >>>> - float alpha_ref; >>> >>>> +union si_shader_key { >>> >>>> + struct { >>> >>>> + unsigned export_16bpc:8; >>> >>>> + unsigned nr_cbufs:4; >>> >>>> + unsigned color_two_side:1; >>> >>>> + unsigned alpha_func:3; >>> >>>> + unsigned flatshade:1; >>> >>>> + float alpha_ref; >>> >>>> + } ps; >>> >>>> + struct { >>> >>>> + unsigned instance_divisors[PIPE_MAX_ATTRIBS]; >>> >>>> + } vs; >>> >>>> }; >>> >>> This grows the shader key from 8 to 128 bytes. I don't suppose the >>> >>> instance divisors could be encoded in a more compact way? E.g. loading >>> >>> the divisor values from constants and only tracking which elements use a >>> >>> divisor in a bitmask in the key. >>> >> Considered that also, and I have two problems with that approach: >>> >> 1. While immediates are converted to shifts & muls, dividing even by a >>> >> constant in the shader isn't cheap. >>> > Is that really significant? How much work would it be to come up with a >>> > worst case test and measure the difference? >>> >>> Well no idea how to measure that on SI, >> >> I'd guess something like: With otherwise trivial vertex and pixel >> shaders, draw huge numbers of triangles generating one pixel each, and >> measure how long it takes. >> >> >>> but when I implemented the same feature on R600 the difference between >>> using reciprocal and mul compared to mulhi where quite significant. >> >> I don't see anything about this in the r600g shader key though. What's >> the difference? >> >> >>> >> How about storing only a byte for the instance_divisor? That limit's the >>> >> divisor to a modulo of 256, but I don't think that would be so extremly >>> >> bad. >>> > I have no idea what the impact of that would be. What happens if an app >>> > tries to use a divisor >= 256? >>> >>> It probably would select the wrong shader :( >>> >>> >> That would reduce the key to 32 bytes instead. >>> > Still seems kind of big. >>> >>> Ok how about the following compromise: First we use a short for the >>> instance divisor, that makes the key 32 bytes in size and should leave >>> enough room for larger instance divisors, and second we don't copy the >>> key around so much any more. >> >> This still won't work correctly for some legal divisor values, right? So >> I'm afraid this is a flawed compromise, as it doesn't really address the >> key size issues (potentially huge number of shader variants, cycles >> spent in memcmp, memory usage) either. If those can't be addressed, it >> should at least handle all legal values correctly. > > We use the large keys now and look into runtime shader patching for > certain values as an improvement later on.
Yeah something like that, or we could just put the whole key in a constant buffer and have the shader figure out what to do => no shader recompilations. Or we could use indirect subroutines calls (as in GL_ARB_shader_subroutine) and decide at draw time which subroutine should be called depending on the key. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev