CSE would otherwise combine the two mul(8) emitted by [iu]mulExtended:
mul(8) acc0 x y
mach(8) null x y
mov(8) lsb acc0
...
mul(8) acc0 x y
mach(8) msb x y
Into:
mul(8) temp x y
mov(8) acc0 temp
mach(8) null x y
mov(8) lsb acc0
...
mov(8) acc0 temp
mach(8) msb x y
But mul(8) into the accumulator produces more than 32-bits of precision,
which is required and lost if multiplying into a general register and
moving to the accumulator.
---
src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index ccd4e5e..61b3aeb 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -98,7 +98,8 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb)
if (is_expression(inst) &&
!inst->predicate &&
!inst->is_partial_write() &&
- !inst->conditional_mod)
+ !inst->conditional_mod &&
+ inst->dst.file != HW_REG)
{
bool found = false;
--
1.8.3.2
_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev