On Mon, Jan 18, 2016 at 4:14 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > The whole point of inlining sources is to reduce loads. We can end up in > a situation where one value is used a lot of times, and one value is > used only once per instruction. The once-per-instruction one is the one > that should get inlined, but with the previous algorithm, it was given > no preference. > > This flips things around to preferring putting less-referenced values > into src1 which increases the likelihood of them being inlined. > > While we're at it, adjust the heuristic to not treat 0 as an immediate, > as well as (effectively) check for situations where LIMMs can't be > loaded. All this yields improvements on nvc0: > > total instructions in shared programs : 6261157 -> 6255985 (-0.08%) > total gprs used in shared programs : 945082 -> 943417 (-0.18%) > total local used in shared programs : 30372 -> 30288 (-0.28%) > total bytes used in shared programs : 50089256 -> 50047880 (-0.08%) > > local gpr inst bytes > helped 21 822 3332 3332 > hurt 0 278 565 565 > > And more importantly avoids generating really bad code with SSBOs, where > we end up checking a lot of different values (usually immediates) against > the length. > > On nv50 the effect is less pronounced, although it does seem to, on > average, pack the instructions better (instructions went up but bytes > went down): > > total instructions in shared programs : 6346564 -> 6347676 (0.02%) > total gprs used in shared programs : 728719 -> 725883 (-0.39%) > total local used in shared programs : 3552 -> 3552 (0.00%) > total bytes used in shared programs : 43995688 -> 43977840 (-0.04%) > > local gpr inst bytes > helped 0 1326 3111 3677 > hurt 0 868 3572 3108 > > Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu> > --- > .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 26 > +++++++++++++--------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index f5c590e..4d82938 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -171,7 +171,10 @@ LoadPropagation::isImmdLoad(Instruction *ld) > if (!ld || (ld->op != OP_MOV) || > ((typeSizeof(ld->dType) != 4) && (typeSizeof(ld->dType) != 8))) > return false; > - return ld->src(0).getFile() == FILE_IMMEDIATE; > + > + // A 0 can be replaced with a register, so it doesn't count as an > immediate. > + ImmediateValue val; > + return ld->src(0).getImmediate(val) && !val.isInteger(0); > } > > bool > @@ -187,7 +190,8 @@ LoadPropagation::isAttribOrSharedLoad(Instruction *ld) > void > LoadPropagation::checkSwapSrc01(Instruction *insn) > { > - if (!prog->getTarget()->getOpInfo(insn).commutative) > + const Target *targ = prog->getTarget(); > + if (!targ->getOpInfo(insn).commutative) > if (insn->op != OP_SET && insn->op != OP_SLCT) > return; > if (insn->src(1).getFile() != FILE_GPR) > @@ -196,14 +200,16 @@ LoadPropagation::checkSwapSrc01(Instruction *insn) > Instruction *i0 = insn->getSrc(0)->getInsn(); > Instruction *i1 = insn->getSrc(1)->getInsn(); > > - if (isCSpaceLoad(i0)) { > - if (!isCSpaceLoad(i1)) > - insn->swapSources(0, 1); > - else > - return; > - } else > - if (isImmdLoad(i0)) { > - if (!isCSpaceLoad(i1) && !isImmdLoad(i1)) > + // Swap sources to inline the less frequently used source. That way, > + // optimistically, it will eventually be able to remove the instruction. > + int i0refs = insn->getSrc(0)->refCount(); > + int i1refs = insn->getSrc(1)->refCount(); > + > + if (isCSpaceLoad(i0) || isImmdLoad(i0)) { > + if (targ->insnCanLoad(insn, 1, i0) && (
Changing this to be if ((isCSpace || isImmd) && insnCanLoad)) { gets much better nv50 improvements (and no nvc0 changes) since it can then try to do the attrib swap (which will never lead to anything good on nvc0). I won't re-send a v2 though. > + (!isImmdLoad(i1) && !isCSpaceLoad(i1)) || > + !targ->insnCanLoad(insn, 1, i1) || > + i0refs < i1refs)) > insn->swapSources(0, 1); > else > return; > -- > 2.4.10 > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev