On Mon, Apr 18, 2016 at 5:18 PM, Matt Turner <[email protected]> wrote: > On Mon, Apr 18, 2016 at 5:08 PM, Ian Romanick <[email protected]> wrote: >> On 04/18/2016 04:14 PM, Matt Turner wrote: >>> Analogous to commit 1e4e17fbd in the i965/fs backend. >>> >>> Because the copy propagation pass in the vec4 backend is strictly local, >>> we look at the immediate values coming from NIR and emit the multiplies >>> we need directly. If the copy propagation pass becomes smarter in the >>> future, we can reduce the nir_op_imul case in brw_vec4_nir.cpp to a >>> single multiply. >>> >>> total instructions in shared programs: 7082311 -> 7081953 (-0.01%) >>> instructions in affected programs: 59581 -> 59223 (-0.60%) >>> helped: 293 >>> >>> total cycles in shared programs: 65765712 -> 65764796 (-0.00%) >>> cycles in affected programs: 854112 -> 853196 (-0.11%) >>> helped: 154 >>> HURT: 73 >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 67 >>> ++++++++++++++++++++++++++++++ >>> src/mesa/drivers/dri/i965/brw_vec4.h | 1 + >>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 48 +++++++++------------ >>> 3 files changed, 88 insertions(+), 28 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> index b9cf3f6..1644d4d 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> @@ -1671,6 +1671,71 @@ vec4_visitor::lower_minmax() >>> return progress; >>> } >>> >>> +bool >>> +vec4_visitor::lower_integer_multiplication() >>> +{ >>> + bool progress = false; >>> + >>> + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { >>> + const vec4_builder ibld(this, block, inst); >>> + >>> + if (inst->opcode == BRW_OPCODE_MUL) { >>> + if (inst->dst.is_accumulator() || >>> + (inst->src[1].type != BRW_REGISTER_TYPE_D && >>> + inst->src[1].type != BRW_REGISTER_TYPE_UD)) >>> + continue; >>> + >>> + /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit >>> + * operation directly, but CHV/BXT cannot. >>> + */ >>> + if (devinfo->gen >= 8 && >>> + !devinfo->is_cherryview && !devinfo->is_broxton) >>> + continue; >> >> Shouldn't this whole method just bail if we're Gen >= 8 and !CHV and >> !BXT? Or does this structure simplify future changes? > > Oh, I hadn't noticed. > > The FS code was originally as you suggest, with the function returning > early under those conditions. Curro changed that in commit 2e731264382 > in order to add lowering support for the multiply-high instruction on > all platforms. We may want to do that in the vec4 backend as well. > > The other thing I need to fix is Cherryview multiplications, where we > need to change the type of src1 to UW. I'm not sure if it's better to > do that here, or at a lower level. Maybe in brw_MUL itself since > that's called in a few places... > > Depending on whether people think that code should go here or > elsewhere, I'll move the block to the beginning of the function.
Ken, Curro: any opinion where we should change src1's type to UW on CHV/BXT? _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
