Hi Bill, On Mon, Aug 28, 2017 at 03:56:21PM -0500, Bill Schmidt wrote: > PR81833 identifies a problem with the little-endian vector multiply-sum > instructions. The original implementation is quite poor (and I am allowed > to say that, since it was mine). This patch fixes the code properly. > > The revised code still uses UNSPECs for these ops, which is not strictly > necessary, although descriptive rtl for them would be pretty complex. I've > put in a FIXME to make note of that for a future cleanup.
There is ss_plus, but that is the saturated sum of two things, not of five resp. three as we need for vsumsws and vsum2sws. If you convert to double-width, then add, then clamp, the result will be correct; but then, very often some of that will be folded away (say, when combine works on these patterns), and then you need a lot of different patterns to catch all of this. So we really need an ss_plus that has more than two arguments. We could have an unspec for that as well (instead of the unspecs that work on vectors, unspecs that work on the elements), that may work. Or perhaps it will work best as-is: the only way we get these unspecs is via intrinsics, maybe we should just trust the user. > PR target/81833 > * config/rs6000/altivec.md (altivec_vsum2sws): Convert from a > define_insn to a define_expand. > (altivec_vsum2sws_direct): New define_insn. > (altivec_vsumsws): Convert from a define_insn to a define_expand. > > [gcc/testsuite] > > 2017-08-28 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > PR target/81833 > * gcc.target/powerpc/pr81833.c: New file. > +; FIXME: This can probably be expressed without an UNSPEC. > +(define_insn "altivec_vsum2sws_direct" > [(set (match_operand:V4SI 0 "register_operand" "=v") > (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > + (match_operand:V4SI 2 "register_operand" "v")] > + UNSPEC_VSUM2SWS))] > "TARGET_ALTIVEC" > + "vsum2sws %0,%1,%2" > + [(set_attr "type" "veccomplex") > + (set_attr "length" "4")]) This no longer writes to VSCR, please fix that. "length" 4 is the default, just leave it. vsumsws has both of these correct already. > --- gcc/testsuite/gcc.target/powerpc/pr81833.c (nonexistent) > +++ gcc/testsuite/gcc.target/powerpc/pr81833.c (working copy) > @@ -0,0 +1,54 @@ > +/* PR81833: This used to fail due to improper implementation of vec_msum. */ > + > +/* { dg-do run {target { lp64 } } } */ Does this need lp64? I don't at first glance see anything that would need it. Oh, vec_vsx_ld perhaps? Okay for trunk and branches with the above fixes. Thanks! Segher