Hi, when we sent the patches for the new nir->vec4 backend we mentioned that we had a few dEQP tests that would fail to link because of register spilling. Now that we have added GS support we see a few instances of this problem popping up in a few GS piglit tests too, for example this one:
tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test I have been looking into what is going on with these tests and I came to the conclusion that the problem is a consequence of various factors, but probably the main thing contributing to it is the way our SSA pass works. That said, I am not that experienced with NIR, so it could also be that my analysis is missing something and I am just arriving to wrong conclusions, so I'll explain my thoughts below and hopefully someone else with more NIR experience can jump in and confirm or reject my analysis. The GS code in that test looks like this: for (int p = 0; p < 3; p++) { color = ((index >= ins[p].m1.length() ? ins[p].m2[index-ins[p].m1.length()] : ins[p].m1[index]) == expect) ? vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0); gl_Position = gl_in[p].gl_Position; EmitVertex(); } One thing that is immediately contributing to the register pressure is some really awful code generated because of the indirect array indexing on the inputs inside the loop. This is because of the lower_variable_index_to_cond_assign lowering pass called from brw_shader.cpp. This pass will convert that color assignment into a bunch of nested if/else statements which makes the generated GLSL IR code rather large, involving plenty of temporaries too. This is only made worse by the fact that loop unrolling will replicate that 3 times. The result is a huge pile of GLSL IR with a few dozens of nested if/else statements and temporaries that looks like [1] (that is only a fragment of the GLSL IR). One thing that is particularly relevant in that code is that it has multiple conditional assignments to the same variable (dereference_array_value) as a consequence of this lowering pass. That much, however, is common to the NIR and non-NIR paths. The problem in the NIR case is that all these assignments generate new SSA values, which then become new registers in the final NIR form. This leads to NIR code like [2]. In contrast, the old vec4 visitor path, is able to have writes to the same variable write to the same register. As a result, if I print the code right before register allocation in the NIR path [3] and I compare that to what we get with the old vec4 visitor path at that same point [4], it is clearly visible that this difference is allowing the vec4 visitor path to reduce register pressure (see how in [4] we have multiple writes to vgrf5, while in [3] we always write to a new vgrf every time). So, am I missing something or is this kind of result expected with NIR programs? Is there anything in the nir->vec4 pass that we can do to fix this or does this need to be fixed when going out of SSA moe inside NIR? Iago [1] http://pastebin.com/5uA8ex2S [2] http://pastebin.com/pqLfvAVN [3] http://pastebin.com/64nSuUH8 [4] http://pastebin.com/WCrdYxzt _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev