Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

Richard Biener Tue, 24 Jun 2014 00:37:27 -0700

On Mon, 23 Jun 2014, Cong Hou wrote:

> It has been 8 months since this patch is posted. I have addressed all
> comments to this patch.
> 
> The SAD pattern is very useful for some multimedia algorithms like
> ffmpeg. This patch will greatly improve the performance of such
> algorithms. Could you please have a look again and check if it is OK
> for the trunk? If it is necessary I can re-post this patch in a new
> thread.


I will try to get to this one this week but can't easily find the
latest patch, so - can you re-post it in a new thread?

Thanks,
Richard.

> Thank you!
> 
> 
> Cong
> 
> 
> On Tue, Dec 17, 2013 at 10:04 AM, Cong Hou <co...@google.com> wrote:
> >
> > Ping?
> >
> >
> > thanks,
> > Cong
> >
> >
> > On Mon, Dec 2, 2013 at 5:06 PM, Cong Hou <co...@google.com> wrote:
> > > Hi Richard
> > >
> > > Could you please take a look at this patch and see if it is ready for
> > > the trunk? The patch is pasted as a text file here again.
> > >
> > > Thank you very much!
> > >
> > >
> > > Cong
> > >
> > >
> > > On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou <co...@google.com> wrote:
> > >> Hi James
> > >>
> > >> Sorry for the late reply.
> > >>
> > >>
> > >> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh
> > >> <james.greenha...@arm.com> wrote:
> > >>>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote:
> > >>>> > Thank you for your detailed explanation.
> > >>>> >
> > >>>> > Once GCC detects a reduction operation, it will automatically
> > >>>> > accumulate all elements in the vector after the loop. In the loop the
> > >>>> > reduction variable is always a vector whose elements are reductions 
> > >>>> > of
> > >>>> > corresponding values from other vectors. Therefore in your case the
> > >>>> > only instruction you need to generate is:
> > >>>> >
> > >>>> >     VABAL   ops[3], ops[1], ops[2]
> > >>>> >
> > >>>> > It is OK if you accumulate the elements into one in the vector inside
> > >>>> > of the loop (if one instruction can do this), but you have to make
> > >>>> > sure other elements in the vector should remain zero so that the 
> > >>>> > final
> > >>>> > result is correct.
> > >>>> >
> > >>>> > If you are confused about the documentation, check the one for
> > >>>> > udot_prod (just above usad in md.texi), as it has very similar
> > >>>> > behavior as usad. Actually I copied the text from there and did some
> > >>>> > changes. As those two instruction patterns are both for 
> > >>>> > vectorization,
> > >>>> > their behavior should not be difficult to explain.
> > >>>> >
> > >>>> > If you have more questions or think that the documentation is still
> > >>>> > improper please let me know.
> > >>>
> > >>> Hi Cong,
> > >>>
> > >>> Thanks for your reply.
> > >>>
> > >>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and
> > >>> DOT_PROD_EXPR and I see that the same ambiguity exists for
> > >>> DOT_PROD_EXPR. Can you please add a note in your tree.def
> > >>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either:
> > >>>
> > >>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
> > >>>   tmp2 = ABS_EXPR (tmp)
> > >>>   arg3 = PLUS_EXPR (tmp2, arg3)
> > >>>
> > >>> or:
> > >>>
> > >>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
> > >>>   tmp2 = ABS_EXPR (tmp)
> > >>>   arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
> > >>>
> > >>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a
> > >>> a value of the same (widened) type as arg3.
> > >>>
> > >>
> > >>
> > >> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I
> > >> mentioned it in tree.def).
> > >>
> > >>
> > >>> Also, while looking for the history of DOT_PROD_EXPR I spotted this
> > >>> patch:
> > >>>
> > >>>   [autovect] [patch] detect mult-hi and sad patterns
> > >>>   http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html
> > >>>
> > >>> I wonder what the reason was for that patch to be dropped?
> > >>>
> > >>
> > >> It has been 8 years.. I have no idea why this patch is not accepted
> > >> finally. There is even no reply in that thread. But I believe the SAD
> > >> pattern is very important to be recognized. ARM also provides
> > >> instructions for it.
> > >>
> > >>
> > >> Thank you for your comment again!
> > >>
> > >>
> > >> thanks,
> > >> Cong
> > >>
> > >>
> > >>
> > >>> Thanks,
> > >>> James
> > >>>
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

Reply via email to