On Mon, Apr 18, 2016 at 5:08 PM, Ian Romanick <[email protected]> wrote: > On 04/18/2016 04:14 PM, Matt Turner wrote: >> Analogous to commit 1e4e17fbd in the i965/fs backend. >> >> Because the copy propagation pass in the vec4 backend is strictly local, >> we look at the immediate values coming from NIR and emit the multiplies >> we need directly. If the copy propagation pass becomes smarter in the >> future, we can reduce the nir_op_imul case in brw_vec4_nir.cpp to a >> single multiply. >> >> total instructions in shared programs: 7082311 -> 7081953 (-0.01%) >> instructions in affected programs: 59581 -> 59223 (-0.60%) >> helped: 293 >> >> total cycles in shared programs: 65765712 -> 65764796 (-0.00%) >> cycles in affected programs: 854112 -> 853196 (-0.11%) >> helped: 154 >> HURT: 73 >> --- >> src/mesa/drivers/dri/i965/brw_vec4.cpp | 67 >> ++++++++++++++++++++++++++++++ >> src/mesa/drivers/dri/i965/brw_vec4.h | 1 + >> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 48 +++++++++------------ >> 3 files changed, 88 insertions(+), 28 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> index b9cf3f6..1644d4d 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> @@ -1671,6 +1671,71 @@ vec4_visitor::lower_minmax() >> return progress; >> } >> >> +bool >> +vec4_visitor::lower_integer_multiplication() >> +{ >> + bool progress = false; >> + >> + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { >> + const vec4_builder ibld(this, block, inst); >> + >> + if (inst->opcode == BRW_OPCODE_MUL) { >> + if (inst->dst.is_accumulator() || >> + (inst->src[1].type != BRW_REGISTER_TYPE_D && >> + inst->src[1].type != BRW_REGISTER_TYPE_UD)) >> + continue; >> + >> + /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit >> + * operation directly, but CHV/BXT cannot. >> + */ >> + if (devinfo->gen >= 8 && >> + !devinfo->is_cherryview && !devinfo->is_broxton) >> + continue; > > Shouldn't this whole method just bail if we're Gen >= 8 and !CHV and > !BXT? Or does this structure simplify future changes?
Oh, I hadn't noticed. The FS code was originally as you suggest, with the function returning early under those conditions. Curro changed that in commit 2e731264382 in order to add lowering support for the multiply-high instruction on all platforms. We may want to do that in the vec4 backend as well. The other thing I need to fix is Cherryview multiplications, where we need to change the type of src1 to UW. I'm not sure if it's better to do that here, or at a lower level. Maybe in brw_MUL itself since that's called in a few places... Depending on whether people think that code should go here or elsewhere, I'll move the block to the beginning of the function. _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
