On Wed, Jan 2, 2019 at 5:19 PM Ian Romanick <i...@freedesktop.org> wrote: > > On 12/19/18 8:39 AM, Jonathan Marek wrote: > > This works by moving the fadd up across the ffma operations, so that it > > can eventually can be combined with a fmul. I'm not sure it works in all > > cases, but it works in all the common cases. > > > > This will only affect freedreno since it is the only driver using the > > fuse_ffma option. > > tl;dr: Optimal generation of FFMAs is much more difficult than you would > think it should be. You should collect some actual data before landing > this. > > Any change to ffma generation is likely to have massive, unforeseen > changes to lots of shaders. Seemingly simple, obvious changes result in > changes to live ranges, register pressure, scheduling, constant folding, > and on, and on. > > I took this patch, substituted !options->lower_ffma for > options->fuse_ffma in the pattern you added, and ran it through > shader-db for Skylake and Haswell. As I expected, the results were just > all over the place (see below). Notice that register spills are helped > on one platform but hurt on the other. > > There are some simple rules in nir_opt_algebraic for generating and > reassociating ffmas. Given the complex interactions with live ranges, > register pressure, and scheduling, I feel like ffma generation should > happen much, much later in the process... it should almost certainly be > deep in the backend where register pressure and scheduling information > are available.
To be fair, I've tried a few different approaches, but I've yet to come up with a good way to balance register pressure and instruction scheduling in ir3 backend[1].. and for a2xx, given that it is limited to more or less gles2 level shader features (which narrows the pool of possible shaders to care about), a simple mostly-good-enough option to improve things in nir, as long as it doesn't hurt other drivers, has some appeal. I guess before/after shader-db runs are in order. But I guess for a2xx, esp on a20x hw which seem to have pretty weak shader core this could still be useful. BR, -R [1] tbh, ir3 instruction scheduler needs to be semi-aggressive to fill delay slots.. but I think it doesn't always make good decisions without having some clue about register pressure > The Intel compiler has its own pass for ffma generation, and I've found > that makes really, really bad choices due to lack of this information. > For example, consider a sequence like > > (shaderInputA * uniformB) + (texture(...) * shaderInputC) > > There are two ways to generate an ffma from that. One will schedule > well, and the other will be horrible. You /probably/ want > > ffma(texture(...), shaderInputC, (shaderInputA * uniformB)) > > so that the first multiply can happen during the latency of the texture > lookup. But maybe not. Maybe shaderInputA and uniformB are still live > after the multiply and storing the result of the multiply pushes > register pressure too high. > > Right now our ffma pass is greedy. If it sees a*b+c, it will always > generate ffma(a, b, c), regardless of whether or not c is also a > multiply. In one of my experiments, I flipped the logic so a*b+c*d > would always generate ffma(c, d, a*b). The number of helped and hurt > shaders was very close to even. Some shaders were helped by a huge > amount, and other shaders were hurt by an equally huge amount. I also > tried not generating an ffma at all for the a*b+c*d case. My > recollection is that a few shaders were helped by a large amount, and > many thousands of shaders were hurt by small amounts. > > If I add it all up, I probably spent several weeks last year poking at > changes like this in our ffma pass. It began to feel like the old woman > who swallowed a fly. Every change helped some things, but it made other > things fall off a cliff. The next fix helped a few of the things > damaged by the previous change, but it made other things fall of a > different cliff. I eventually abandoned the project. If I ever pick it > back up, it will be as a pass that occurs closer to scheduling and > register allocation. > > Skylake > total instructions in shared programs: 15031138 -> 15035206 (0.03%) > instructions in affected programs: 1230624 -> 1234692 (0.33%) > helped: 1428 > HURT: 1067 > helped stats (abs) min: 1 max: 671 x̄: 7.08 x̃: 3 > helped stats (rel) min: 0.04% max: 24.72% x̄: 2.30% x̃: 1.78% > HURT stats (abs) min: 1 max: 1601 x̄: 13.29 x̃: 4 > HURT stats (rel) min: 0.05% max: 352.64% x̄: 4.42% x̃: 2.35% > 95% mean confidence interval for instructions value: 0.03 3.23 > 95% mean confidence interval for instructions %-change: 0.24% 0.91% > Instructions are HURT. > > total cycles in shared programs: 369712682 -> 370166527 (0.12%) > cycles in affected programs: 128542483 -> 128996328 (0.35%) > helped: 1679 > HURT: 2639 > helped stats (abs) min: 1 max: 27317 x̄: 162.81 x̃: 18 > helped stats (rel) min: <.01% max: 60.25% x̄: 2.34% x̃: 1.38% > HURT stats (abs) min: 1 max: 57100 x̄: 275.56 x̃: 58 > HURT stats (rel) min: <.01% max: 147.37% x̄: 8.62% x̃: 5.01% > 95% mean confidence interval for cycles value: 61.86 148.35 > 95% mean confidence interval for cycles %-change: 4.06% 4.66% > Cycles are HURT. > > total spills in shared programs: 10158 -> 9688 (-4.63%) > spills in affected programs: 1829 -> 1359 (-25.70%) > helped: 140 > HURT: 3 > > total fills in shared programs: 22117 -> 21371 (-3.37%) > fills in affected programs: 2575 -> 1829 (-28.97%) > helped: 140 > HURT: 3 > > LOST: 7 > GAINED: 0 > > > Haswell > total instructions in shared programs: 13625863 -> 13635875 (0.07%) > instructions in affected programs: 1554579 -> 1564591 (0.64%) > helped: 844 > HURT: 1651 > helped stats (abs) min: 1 max: 96 x̄: 4.16 x̃: 3 > helped stats (rel) min: 0.04% max: 10.26% x̄: 1.91% x̃: 1.90% > HURT stats (abs) min: 1 max: 1602 x̄: 8.19 x̃: 5 > HURT stats (rel) min: 0.10% max: 346.00% x̄: 2.97% x̃: 1.45% > 95% mean confidence interval for instructions value: 2.70 5.33 > 95% mean confidence interval for instructions %-change: 1.02% 1.63% > Instructions are HURT. > > total cycles in shared programs: 372618507 -> 372381101 (-0.06%) > cycles in affected programs: 147167634 -> 146930228 (-0.16%) > helped: 1895 > HURT: 2001 > helped stats (abs) min: 1 max: 8639 x̄: 341.01 x̃: 26 > helped stats (rel) min: <.01% max: 29.69% x̄: 2.78% x̃: 1.66% > HURT stats (abs) min: 1 max: 40206 x̄: 204.30 x̃: 53 > HURT stats (rel) min: 0.01% max: 71.38% x̄: 6.06% x̃: 3.11% > 95% mean confidence interval for cycles value: -92.06 -29.82 > 95% mean confidence interval for cycles %-change: 1.52% 2.00% > Inconclusive result (value mean confidence interval and %-change mean > confidence interval disagree). > > total spills in shared programs: 82582 -> 83316 (0.89%) > spills in affected programs: 71432 -> 72166 (1.03%) > helped: 16 > HURT: 348 > > total fills in shared programs: 93463 -> 94192 (0.78%) > fills in affected programs: 72319 -> 73048 (1.01%) > helped: 16 > HURT: 379 > > LOST: 6 > GAINED: 0 > > > > Example: > > matrix * vec4(coord, 1.0) > > is compiled as: > > fmul, ffma, ffma, fadd > > and with this patch: > > ffma, ffma, ffma > > > > Signed-off-by: Jonathan Marek <jonat...@marek.ca> > > --- > > src/compiler/nir/nir_opt_algebraic.py | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > > b/src/compiler/nir/nir_opt_algebraic.py > > index 506d45e55b..97a6c0d8dc 100644 > > --- a/src/compiler/nir/nir_opt_algebraic.py > > +++ b/src/compiler/nir/nir_opt_algebraic.py > > @@ -137,6 +137,7 @@ optimizations = [ > > (('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a)))), > > ('flrp', a, b, c), '!options->lower_flrp64'), > > (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'), > > (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'), > > + (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d)), > > 'options->fuse_ffma'), > > > > (('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b, c), > > d)), > > (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)), > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev