On Mon, 23 Jun 2014, Cong Hou wrote: > It has been 8 months since this patch is posted. I have addressed all > comments to this patch. > > The SAD pattern is very useful for some multimedia algorithms like > ffmpeg. This patch will greatly improve the performance of such > algorithms. Could you please have a look again and check if it is OK > for the trunk? If it is necessary I can re-post this patch in a new > thread.
I will try to get to this one this week but can't easily find the latest patch, so - can you re-post it in a new thread? Thanks, Richard. > Thank you! > > > Cong > > > On Tue, Dec 17, 2013 at 10:04 AM, Cong Hou <co...@google.com> wrote: > > > > Ping? > > > > > > thanks, > > Cong > > > > > > On Mon, Dec 2, 2013 at 5:06 PM, Cong Hou <co...@google.com> wrote: > > > Hi Richard > > > > > > Could you please take a look at this patch and see if it is ready for > > > the trunk? The patch is pasted as a text file here again. > > > > > > Thank you very much! > > > > > > > > > Cong > > > > > > > > > On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou <co...@google.com> wrote: > > >> Hi James > > >> > > >> Sorry for the late reply. > > >> > > >> > > >> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh > > >> <james.greenha...@arm.com> wrote: > > >>>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote: > > >>>> > Thank you for your detailed explanation. > > >>>> > > > >>>> > Once GCC detects a reduction operation, it will automatically > > >>>> > accumulate all elements in the vector after the loop. In the loop the > > >>>> > reduction variable is always a vector whose elements are reductions > > >>>> > of > > >>>> > corresponding values from other vectors. Therefore in your case the > > >>>> > only instruction you need to generate is: > > >>>> > > > >>>> > VABAL ops[3], ops[1], ops[2] > > >>>> > > > >>>> > It is OK if you accumulate the elements into one in the vector inside > > >>>> > of the loop (if one instruction can do this), but you have to make > > >>>> > sure other elements in the vector should remain zero so that the > > >>>> > final > > >>>> > result is correct. > > >>>> > > > >>>> > If you are confused about the documentation, check the one for > > >>>> > udot_prod (just above usad in md.texi), as it has very similar > > >>>> > behavior as usad. Actually I copied the text from there and did some > > >>>> > changes. As those two instruction patterns are both for > > >>>> > vectorization, > > >>>> > their behavior should not be difficult to explain. > > >>>> > > > >>>> > If you have more questions or think that the documentation is still > > >>>> > improper please let me know. > > >>> > > >>> Hi Cong, > > >>> > > >>> Thanks for your reply. > > >>> > > >>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and > > >>> DOT_PROD_EXPR and I see that the same ambiguity exists for > > >>> DOT_PROD_EXPR. Can you please add a note in your tree.def > > >>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either: > > >>> > > >>> tmp = WIDEN_MINUS_EXPR (arg1, arg2) > > >>> tmp2 = ABS_EXPR (tmp) > > >>> arg3 = PLUS_EXPR (tmp2, arg3) > > >>> > > >>> or: > > >>> > > >>> tmp = WIDEN_MINUS_EXPR (arg1, arg2) > > >>> tmp2 = ABS_EXPR (tmp) > > >>> arg3 = WIDEN_SUM_EXPR (tmp2, arg3) > > >>> > > >>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a > > >>> a value of the same (widened) type as arg3. > > >>> > > >> > > >> > > >> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I > > >> mentioned it in tree.def). > > >> > > >> > > >>> Also, while looking for the history of DOT_PROD_EXPR I spotted this > > >>> patch: > > >>> > > >>> [autovect] [patch] detect mult-hi and sad patterns > > >>> http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html > > >>> > > >>> I wonder what the reason was for that patch to be dropped? > > >>> > > >> > > >> It has been 8 years.. I have no idea why this patch is not accepted > > >> finally. There is even no reply in that thread. But I believe the SAD > > >> pattern is very important to be recognized. ARM also provides > > >> instructions for it. > > >> > > >> > > >> Thank you for your comment again! > > >> > > >> > > >> thanks, > > >> Cong > > >> > > >> > > >> > > >>> Thanks, > > >>> James > > >>> > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer