On Fri, Nov 1, 2013 at 3:03 AM, Cong Hou <[email protected]> wrote:
> According to your comments, I made the following modifications to this patch:
>
> 1. Now SAD pattern does not require the first and second operands to
> be unsigned. And two versions (signed/unsigned) of the SAD optabs are
> defined: usad_optab and ssad_optab.
>
> 2. Use expand_simple_binop instead of gen_rtx_PLUS to generate the
> plus expression in sse.md. Also change the type of the second/third
> operands to be nonimmediate_operand.
>
> 3. Add the document for SAD_EXPR.
>
> 4. Verify the operands of SAD_EXPR.
>
> 5. Create a new target: vect_usad_char, and use it in the test case.
>
> The updated patch is pasted below.
> +(define_expand "usadv16qi"
> + [(match_operand:V4SI 0 "register_operand")
> + (match_operand:V16QI 1 "register_operand")
> + (match_operand:V16QI 2 "nonimmediate_operand")
> + (match_operand:V4SI 3 "nonimmediate_operand")]
> + "TARGET_SSE2"
> +{
> + rtx t1 = gen_reg_rtx (V2DImode);
> + rtx t2 = gen_reg_rtx (V4SImode);
> + emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2]));
> + convert_move (t2, t1, 0);
> + emit_insn (gen_rtx_SET (VOIDmode, operands[0],
> + expand_simple_binop (V4SImode, PLUS, t2, operands[3],
> + NULL, 0, OPTAB_DIRECT)));
It seems to me that generic expander won't bring any benefit there,
operands are already in correct form, so please change the last lines
simply to:
emit_insn (gen_addv4si3 (operands[0], t2, operands[3]));
> + DONE;
> +})
> +
> +(define_expand "usadv32qi"
> + [(match_operand:V8SI 0 "register_operand")
> + (match_operand:V32QI 1 "register_operand")
> + (match_operand:V32QI 2 "nonimmediate_operand")
> + (match_operand:V8SI 3 "nonimmediate_operand")]
> + "TARGET_AVX2"
> +{
> + rtx t1 = gen_reg_rtx (V4DImode);
> + rtx t2 = gen_reg_rtx (V8SImode);
> + emit_insn (gen_avx2_psadbw (t1, operands[1], operands[2]));
> + convert_move (t2, t1, 0);
> + emit_insn (gen_rtx_SET (VOIDmode, operands[0],
> + expand_simple_binop (V8SImode, PLUS, t2, operands[3],
> + NULL, 0, OPTAB_DIRECT)));
Same here, using gen_addv8si3.
No need to repost the patch with this trivial change.
Sorry for the confusion,
Uros.