Following is a patch fixing Mesa's inability to run several complex GLSL shader benchmarks. I had never been able to successfully run the Pixmark Piano and Volplosion tests from the GpuTest benchmark suite on my RV670 card. They would always fail to render anything and report that "translation from TGSI failed" (and using the LLVM backend wouldn't help, either).
After some investigation I've found out that the problem is due to the quite complex main GLSL fragment shader programs that essentially render the whole scene. The GLSL code is not that difficult to grasp, but after compilation to TGSI together with heavy function inlining the resulting TGSI program is essentially a single loop with a very long body (for Volplosion) or a structure of several nested loop levels with intermixed conditionals (Piano). In both cases there are hundreds of registers used within the loop, even though each register is usually written once and then read a few instructions later, never to be touched again. There is a register merging pass following the translation to TGSI, however the algorithm there completely avoids trying to optimize anything inside loops. Thus the whole abomination using >300 GPRs is directly passed to the r600g driver that obviously can't handle it (as the HW has only 128 GPRs) [1]. I've rewritten the core of the register merging algorithm to be able to cope in the presence of (almost) arbitrarily nested loop and conditional structures. With this patch (tested on master), the Pixmark tests finally work just fine and some of the shadertoy demos have started working for me, too. I had been using this patch on top of Mesa 10.1.x and 10.2.x for several weeks now in daily use (including 3D gaming) and have seen no regressions. I'm unfortunately unable to run piglit at all as it fails the sanity check due to some mysterious z-buffer readback inaccuracy. I welcome any comments and suggestions (as this is my first Mesa contribution), but please CC me as I'm not subscribed to the list. Best regards, Tomáš Trnka [1] Actually, there's a surprising catch to this as the reason for failure is not as straightforward: the translation fails in check_and_set_bank_swizzle, but not because there are not enough GPRs, but because the code in check_vector (r600_asm.c) silently treats anything with index > 128 as a constant file and it then runs out of available cfile read ports. If I'm not misunderstanding this completely, it also means that programs using just around 130 GPRs will compile with no errors, silently trying to use the first few constants instead of the last GPRs. Maybe the error checking there could be improved a little. Tomáš Trnka (1): glsl_to_tgsi: Loop- and conditional-safe TGSI register merging src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 136 +++++++++++++++++++++++------ 1 file changed, 108 insertions(+), 28 deletions(-) -- 1.9.3
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
