Samuel Iglesias Gonsálvez <[email protected]> writes: > Previously we were emitting two MOV_INDIRECT instructions by calculating > source's indirect offsets for each 32-bit half of a DF source. However, > this is not needed as we can just emit two 32-bit MOV INDIRECT without > doing that calculation. >
Maybe mention here the main motivation for fixing this and nominating it for stable: The lowered BSW/BXT indirect move instructions had incorrect source types, which luckily wasn't causing incorrect assembly to be generated due to the bug fixed in the next patch, but would have confused the remaining back-end IR infrastructure due to the mismatch between the IR source types and the emitted machine code. > Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> > Cc: "17.0" <[email protected]> > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 23 ++++++++++------------- > 1 file changed, 10 insertions(+), 13 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 991c20fad62..8975940e10b 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -3878,31 +3878,28 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, > nir_intrinsic_instr *instr > unsigned read_size = instr->const_index[1] - > (instr->num_components - 1) * type_sz(dest.type); > > - fs_reg indirect_chv_high_32bit; > - bool is_chv_bxt_64bit = > - (devinfo->is_cherryview || devinfo->is_broxton) && > - type_sz(dest.type) == 8; > - if (is_chv_bxt_64bit) { > - indirect_chv_high_32bit = vgrf(glsl_type::uint_type); > - /* Calculate indirect address to read high 32 bits */ > - bld.ADD(indirect_chv_high_32bit, indirect, brw_imm_ud(4)); > - } > + bool supports_64bit_indirects = > + !devinfo->is_cherryview && !devinfo->is_broxton; > > for (unsigned j = 0; j < instr->num_components; j++) { > - if (!is_chv_bxt_64bit) { > + if (type_sz(dest.type) != 8 || supports_64bit_indirects) { > bld.emit(SHADER_OPCODE_MOV_INDIRECT, > offset(dest, bld, j), offset(src, bld, j), > indirect, brw_imm_ud(read_size)); > } else { > + /* We are going to read 64-bit data in two 32-bit MOV > INDIRECTS, > + * each one reading half of the data. > + */ > + read_size /= 2; I don't think this is right, data for both halves is interleaved so the read size will be roughly the same as for the 64 bit indirect move. > bld.emit(SHADER_OPCODE_MOV_INDIRECT, > subscript(offset(dest, bld, j), > BRW_REGISTER_TYPE_UD, 0), > - offset(src, bld, j), > + subscript(offset(src, bld, j), BRW_REGISTER_TYPE_UD, > 0), > indirect, brw_imm_ud(read_size)); > > bld.emit(SHADER_OPCODE_MOV_INDIRECT, > subscript(offset(dest, bld, j), > BRW_REGISTER_TYPE_UD, 1), > - offset(src, bld, j), > - indirect_chv_high_32bit, brw_imm_ud(read_size)); > + subscript(offset(src, bld, j), BRW_REGISTER_TYPE_UD, > 1), > + indirect, brw_imm_ud(read_size)); Given that this makes both emit statements nearly identical except for the 0/1 indices passed to subscript(), wouldn't it make sense to remove one of them and wrap it in a loop? > } > } > } > -- > 2.11.0 > > _______________________________________________ > mesa-stable mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
