On Wed, Mar 27, 2013 at 7:25 AM, Michel Dänzer <mic...@daenzer.net> wrote: > On Mit, 2013-03-27 at 12:02 +0100, Christian König wrote: >> Am 26.03.2013 18:03, schrieb Michel Dänzer: >> > On Die, 2013-03-26 at 17:37 +0100, Christian König wrote: >> >> Am 26.03.2013 15:56, schrieb Michel Dänzer: >> >>> On Die, 2013-03-26 at 14:51 +0100, Christian König wrote: >> >>>> From: Christian König <christian.koe...@amd.com> >> >>>> >> >>>> Signed-off-by: Christian König <christian.koe...@amd.com> >> >>>> [...] >> >>>> diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h >> >>>> b/src/gallium/drivers/radeonsi/radeonsi_shader.h >> >>>> index 9dae742..e09f297 100644 >> >>>> --- a/src/gallium/drivers/radeonsi/radeonsi_shader.h >> >>>> +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h >> >>>> @@ -111,13 +111,18 @@ struct si_shader { >> >>>> unsigned nr_cbufs; >> >>>> }; >> >>>> >> >>>> -struct si_shader_key { >> >>>> - unsigned export_16bpc:8; >> >>>> - unsigned nr_cbufs:4; >> >>>> - unsigned color_two_side:1; >> >>>> - unsigned alpha_func:3; >> >>>> - unsigned flatshade:1; >> >>>> - float alpha_ref; >> >>>> +union si_shader_key { >> >>>> + struct { >> >>>> + unsigned export_16bpc:8; >> >>>> + unsigned nr_cbufs:4; >> >>>> + unsigned color_two_side:1; >> >>>> + unsigned alpha_func:3; >> >>>> + unsigned flatshade:1; >> >>>> + float alpha_ref; >> >>>> + } ps; >> >>>> + struct { >> >>>> + unsigned instance_divisors[PIPE_MAX_ATTRIBS]; >> >>>> + } vs; >> >>>> }; >> >>> This grows the shader key from 8 to 128 bytes. I don't suppose the >> >>> instance divisors could be encoded in a more compact way? E.g. loading >> >>> the divisor values from constants and only tracking which elements use a >> >>> divisor in a bitmask in the key. >> >> Considered that also, and I have two problems with that approach: >> >> 1. While immediates are converted to shifts & muls, dividing even by a >> >> constant in the shader isn't cheap. >> > Is that really significant? How much work would it be to come up with a >> > worst case test and measure the difference? >> >> Well no idea how to measure that on SI, > > I'd guess something like: With otherwise trivial vertex and pixel > shaders, draw huge numbers of triangles generating one pixel each, and > measure how long it takes. > > >> but when I implemented the same feature on R600 the difference between >> using reciprocal and mul compared to mulhi where quite significant. > > I don't see anything about this in the r600g shader key though. What's > the difference? > > >> >> How about storing only a byte for the instance_divisor? That limit's the >> >> divisor to a modulo of 256, but I don't think that would be so extremly >> >> bad. >> > I have no idea what the impact of that would be. What happens if an app >> > tries to use a divisor >= 256? >> >> It probably would select the wrong shader :( >> >> >> That would reduce the key to 32 bytes instead. >> > Still seems kind of big. >> >> Ok how about the following compromise: First we use a short for the >> instance divisor, that makes the key 32 bytes in size and should leave >> enough room for larger instance divisors, and second we don't copy the >> key around so much any more. > > This still won't work correctly for some legal divisor values, right? So > I'm afraid this is a flawed compromise, as it doesn't really address the > key size issues (potentially huge number of shader variants, cycles > spent in memcmp, memory usage) either. If those can't be addressed, it > should at least handle all legal values correctly.
We use the large keys now and look into runtime shader patching for certain values as an improvement later on. Alex _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev