On 02/25/2014 09:38 AM, Eric Anholt wrote: > Matt Turner <[email protected]> writes: > >> On Mon, Feb 24, 2014 at 10:15 AM, Eric Anholt <[email protected]> wrote: >>> I think we would do better by emitting >>> ADD(y_minus_x, y, negate(x)) >>> MAC(dst, x, y_minus_x, a) >> >> MAC only takes two arguments, so >> - if you meant MAD, there's no MAD on platforms that don't have LRP >> - if you meant MAC(dst, ...) I don't see a way of doing it only two >> instructions, but we could do >> >> MOV(acc, x) >> ADD(y_minus_x, y, negate(x) >> MAC(dst, y_minus_x, a) > > Oops, yeah, I was still thinking in terms of MAD. This should still be > better I think, while being an obvious translation of the LRP > instruction: > > ADD one_minus_a, negate(a), 1.0f > MUL null, y, a > MAC dst, x, one_minus_a > > (multiplying y * a first to slightly reduce the stall pressure from > one_minus_a)
Nice. I agree this is better, but it's harder than you think. We would have to: 1. Create a MAC() emitter. 2. Add BRW_OPCODE_MAC to vec4_generator. 3. Add a new "enable accumulator writes" flag to vec4_instruction and make vec4_generator respect that. (The MUL needs this.) 4. Fix up dead code elimination and other things to know about implicit accumulator writes. Given the severity of this problem (GPU hangs and crashes) and the fact that it's a regression in 10.1---which we plan to ship in three days---I would like to commit my existing patches and improve this after the release. --Ken
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
