On Fri, Nov 01, 2013 at 04:48:53PM +0000, Cong Hou wrote:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 2a5a2e1..8f5d39a 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3.
> Operand 3 is of a mode equal or
> wider than the mode of the product. The result is placed in operand 0, which
> is of the same mode as operand 3.
>
> +@cindex @code{ssad@var{m}} instruction pattern
> +@item @samp{ssad@var{m}}
> +@cindex @code{usad@var{m}} instruction pattern
> +@item @samp{usad@var{m}}
> +Compute the sum of absolute differences of two signed/unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their absolute difference,
> which
> +is of a wider mode, is computed and added to operand 3. Operand 3 is of a
> mode
> +equal or wider than the mode of the absolute difference. The result is placed
> +in operand 0, which is of the same mode as operand 3.
> +
> @cindex @code{ssum_widen@var{m3}} instruction pattern
> @item @samp{ssum_widen@var{m3}}
> @cindex @code{usum_widen@var{m3}} instruction pattern
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 4975a64..1db8a49 100644
I'm not sure I follow, and if I do - I don't think it matches what
you have implemented for i386.
>From your text description I would guess the series of operations to be:
v1 = widen (operands[1])
v2 = widen (operands[2])
v3 = abs (v1 - v2)
operands[0] = v3 + operands[3]
But if I understand the behaviour of PSADBW correctly, what you have
actually implemented is:
v1 = widen (operands[1])
v2 = widen (operands[2])
v3 = abs (v1 - v2)
v4 = reduce_plus (v3)
operands[0] = v4 + operands[3]
To my mind, synthesizing the reduce_plus step will be wasteful for targets
who do not get this for free with their Absolute Difference step. Imagine a
simple loop where we have synthesized the reduce_plus, we compute partial
sums each loop iteration, though we would be better to leave the reduce_plus
step until after the loop. "REDUC_PLUS_EXPR" would be the appropriate
Tree code for this.
I would prefer to see this Tree code not imply the reduce_plus.
Thanks,
James