On 11/09/2016 04:19 PM, Ilia Mirkin wrote:
On Wed, Nov 9, 2016 at 10:10 AM, Samuel Pitoiset
<[email protected]> wrote:
On 11/09/2016 03:58 PM, Ilia Mirkin wrote:
On Wed, Nov 9, 2016 at 9:20 AM, Samuel Pitoiset
<[email protected]> wrote:
For all instructions with 3 sources (like OP_SLCT), src2 needs
to be destroyed because srcExists(2) will return true although
it's actually undefined.
Spotted with my ADD3 series.
Sounds like the ADD3 series' addition to this function doesn't handle
this scenario. It's already properly handled by FMA/MAD, for example.
Nope, it's correctly handled in expr() in my series.
In ConstantFolding::visit(BasicBlock *bb), I added a new opnd2() method
which is used for folding ADD3(a,b,c)-> ADD3(b,a+c) or ->ADD3(a,b+c) because
the pass doesn't currently implement such a case.
Though, opnd2() is called if srcExists(2) is true and if src0/src1 or
src0/src2 are immediate values. But as this is done after the opnd3() call,
if the instruction has been constant folded (just before) but src2 not
erased, it will try to get the immediates and crash... which is expected.
This just makes expr() more robust against that, and it's not entirely
related to my series, because if you try to detect more scenarios where
immediates can be folded *after* opnd3() you will hit the issue for sure.
src1 is already erased in expr(), I don't see why src2 should not be as well
(when optimizing to SAT/MOV).
src2 is also erased.
switch (i->op) {
case OP_MAD:
case OP_FMA: {
...
i->setSrc(2, NULL);
}
You need to add a case for ADD3 here that does the same thing. And
also immediate re-executes expr/opnd in case the other arg was *also*
an immed. That should avoid taking up an extra fold iteration on it.
I know, I already added all the dance there. Let me explain again. :-)
I found the crash with OP_SLCT which has 3 sources. expr() has been
called, and that SLCT has been optimized to MOV because src0 and src1
were 0. Thus, srcExists(1) will return false while srcExists(2) will
return true because it has not been erased. right?
The thing is, after the opnd3() call in the visit func, I added some
logic in order to detect if src0/src2 or src1/src2 are immediates.
Have a look at this chunk http://hastebin.com/ihalexobus.cpp .
Signed-off-by: Samuel Pitoiset <[email protected]>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9cf6ddc..ef31436 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -772,6 +772,7 @@ ConstantFolding::expr(Instruction *i,
break;
default:
i->op = i->saturate ? OP_SAT : OP_MOV; /* SAT handled by unary()
*/
+ i->setSrc(2, NULL);
break;
}
i->subOp = 0;
--
2.10.2
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
-Samuel
--
-Samuel
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev