Ping?
thanks, Cong On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou <co...@google.com> wrote: > Hi James > > Sorry for the late reply. > > > On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh > <james.greenha...@arm.com> wrote: >>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote: >>> > Thank you for your detailed explanation. >>> > >>> > Once GCC detects a reduction operation, it will automatically >>> > accumulate all elements in the vector after the loop. In the loop the >>> > reduction variable is always a vector whose elements are reductions of >>> > corresponding values from other vectors. Therefore in your case the >>> > only instruction you need to generate is: >>> > >>> > VABAL ops[3], ops[1], ops[2] >>> > >>> > It is OK if you accumulate the elements into one in the vector inside >>> > of the loop (if one instruction can do this), but you have to make >>> > sure other elements in the vector should remain zero so that the final >>> > result is correct. >>> > >>> > If you are confused about the documentation, check the one for >>> > udot_prod (just above usad in md.texi), as it has very similar >>> > behavior as usad. Actually I copied the text from there and did some >>> > changes. As those two instruction patterns are both for vectorization, >>> > their behavior should not be difficult to explain. >>> > >>> > If you have more questions or think that the documentation is still >>> > improper please let me know. >> >> Hi Cong, >> >> Thanks for your reply. >> >> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and >> DOT_PROD_EXPR and I see that the same ambiguity exists for >> DOT_PROD_EXPR. Can you please add a note in your tree.def >> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either: >> >> tmp = WIDEN_MINUS_EXPR (arg1, arg2) >> tmp2 = ABS_EXPR (tmp) >> arg3 = PLUS_EXPR (tmp2, arg3) >> >> or: >> >> tmp = WIDEN_MINUS_EXPR (arg1, arg2) >> tmp2 = ABS_EXPR (tmp) >> arg3 = WIDEN_SUM_EXPR (tmp2, arg3) >> >> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a >> a value of the same (widened) type as arg3. >> > > > I have added it, although we currently don't have WIDEN_MINUS_EXPR (I > mentioned it in tree.def). > > >> Also, while looking for the history of DOT_PROD_EXPR I spotted this >> patch: >> >> [autovect] [patch] detect mult-hi and sad patterns >> http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html >> >> I wonder what the reason was for that patch to be dropped? >> > > It has been 8 years.. I have no idea why this patch is not accepted > finally. There is even no reply in that thread. But I believe the SAD > pattern is very important to be recognized. ARM also provides > instructions for it. > > > Thank you for your comment again! > > > thanks, > Cong > > > >> Thanks, >> James >>