Chad Versace <[email protected]> writes: > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > index ebf8990..b5f1aae 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > @@ -348,6 +348,143 @@ vec4_visitor::emit_math(enum opcode opcode, > } > > void > +vec4_visitor::emit_pack_half_2x16(dst_reg dst, src_reg src0) > +{ > + if (intel->gen < 7) > + assert(!"ir_unop_pack_half_2x16 should be lowered"); > + > + /* uint dst; */ > + assert(dst.type == BRW_REGISTER_TYPE_UD); > + > + /* vec2 src0; */ > + assert(src0.type == BRW_REGISTER_TYPE_F); > + > + /* uvec2 tmp; > + * > + * The PRM lists the destination type of f32to16 as W. However, I've > + * experimentally confirmed on gen7 that it must be a 32-bit size, such as > + * UD, in align16 mode. > + */ > + dst_reg tmp_dst(this, glsl_type::uvec2_type); > + src_reg tmp_src(tmp_dst); > + > + /* tmp.xy = f32to16(src0); */ > + tmp_dst.writemask = WRITEMASK_XY; > + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_F32TO16, > + tmp_dst, src0)); > + > + /* The result's high 16 bits are in the low 16 bits of the temporary > + * register's Y channel. The result's low 16 bits are in the low 16 bits > + * of the X channel. > + * > + * In experiments on gen7 I've found the that, in the temporary register, > + * the hight 16 bits of the X and Y channels are zeros. This is critical
"high"
> + * for the SHL and OR instructions below to work as expected.
> + */
The docs say that the high bits are unchanged. The temporary reg will
often have already had 0 in it to begin with, but sometimes not. Have
you confirmed that the high bits of the x channel were changed to 0 if
you had initialized them to non-zero?
> + /* Idea for reducing the above number of registers and instructions
> + * ----------------------------------------------------------------
> + *
> + * It should be possible to remove the temporary register and replace the
> + * SHL and OR instructions above with a single MOV instruction mode in
> + * align1 mode that uses clever register region addressing. (It is
> + * impossible to specify the necessary register regions in align16 mode).
> + * Unfortunately, it is difficult to emit an align1 instruction here.
> + *
> + * In particular, I want to do this:
> + *
> + * # Give dst the form:
> + * #
> + * # w z y x w z y x
> + * # |0|0|0x0000hhhh|0x0000llll|0|0|0x0000hhhh|0x0000llll|
> + * #
> + * f32to16(8) dst<1>.xy:UD src<4;4,1>:F {align16}
> + *
> + * # Transform dst into the form of packHalf2x16's output.
> + * #
> + * # w z y x w z y x
> + * # |0|0|0x00000000|0xhhhhllll|0|0|0x00000000|0xhhhhllll|
> + * #
> + * # Use width=2 in order to move the Y channel's high 16 bits
> + * # into the low 16 bits, thus clearing the Y channel to zero.
> + * #
> + * mov(4) dst.1<1>:UW dst.2<8;2,1>:UW {align1}
> + */
I like the sound of this, and it would be a matter of making a new
VS_OPCODE that the generator implements.
> +}
pgp9zCUXIpXC5.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
