On Fri, Nov 01, 2013 at 04:48:53PM +0000, Cong Hou wrote: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 2a5a2e1..8f5d39a 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -4705,6 +4705,16 @@ wider mode, is computed and added to operand 3. > Operand 3 is of a mode equal or > wider than the mode of the product. The result is placed in operand 0, which > is of the same mode as operand 3. > > +@cindex @code{ssad@var{m}} instruction pattern > +@item @samp{ssad@var{m}} > +@cindex @code{usad@var{m}} instruction pattern > +@item @samp{usad@var{m}} > +Compute the sum of absolute differences of two signed/unsigned elements. > +Operand 1 and operand 2 are of the same mode. Their absolute difference, > which > +is of a wider mode, is computed and added to operand 3. Operand 3 is of a > mode > +equal or wider than the mode of the absolute difference. The result is placed > +in operand 0, which is of the same mode as operand 3. > + > @cindex @code{ssum_widen@var{m3}} instruction pattern > @item @samp{ssum_widen@var{m3}} > @cindex @code{usum_widen@var{m3}} instruction pattern > diff --git a/gcc/expr.c b/gcc/expr.c > index 4975a64..1db8a49 100644
I'm not sure I follow, and if I do - I don't think it matches what you have implemented for i386. >From your text description I would guess the series of operations to be: v1 = widen (operands[1]) v2 = widen (operands[2]) v3 = abs (v1 - v2) operands[0] = v3 + operands[3] But if I understand the behaviour of PSADBW correctly, what you have actually implemented is: v1 = widen (operands[1]) v2 = widen (operands[2]) v3 = abs (v1 - v2) v4 = reduce_plus (v3) operands[0] = v4 + operands[3] To my mind, synthesizing the reduce_plus step will be wasteful for targets who do not get this for free with their Absolute Difference step. Imagine a simple loop where we have synthesized the reduce_plus, we compute partial sums each loop iteration, though we would be better to leave the reduce_plus step until after the loop. "REDUC_PLUS_EXPR" would be the appropriate Tree code for this. I would prefer to see this Tree code not imply the reduce_plus. Thanks, James